Files
direkt-vermittlung-de/docs/implementation_guide.md

9.4 KiB

Implementation Guide: DirektVermittlungDe

While the API Documentation explains how to use the API, this guide explains how to build it, focusing on the backend engineering challenges like encryption handling, database schema, and performance patterns defined in your architecture.


Status: Draft v1.0 Target Audience: Backend Engineering Team [cite_start]Context: Implements constraints from DvdArchitektur.txt [cite: 6]


1. Technology Stack & Setup

[cite_start]Based on the architectural constraints[cite: 45, 46, 47], the recommended reference stack is:

  • Service Layer: Java (Spring Boot 3.x) or Go (Gin/Echo) for high-concurrency performance.
  • Primary Database: PostgreSQL 15+ (Relational data for Routing/Threads).
  • Blob Storage: S3-Compatible Storage (AWS S3 / MinIO) for encrypted PDF payloads.
  • Cache/PubSub: Redis 7.x (Session store, Rate limiting, Async Job queues).

Project Structure (Bounded Contexts)

[cite_start]Organize the codebase into modules matching the architecture[cite: 43]:

  • dvd-intake-service: Handles /documents and Metadata extraction.
  • dvd-communication-service: Handles /threads and /messages.
  • dvd-routing-engine: The logic component for assigning units.
  • dvd-export-worker: Async background workers for eAkte exports.

2. Security Implementation Details

[cite_start]2.1 Handling "Blind" E2E Encryption [cite: 27]

The backend must not attempt to decrypt the encryptedPayload.

  • Ingest: Receive the encryptedPayload (Base64 or binary multipart). Stream it directly to S3 Blob Storage. Do not load the full file into RAM to avoid OOM on large files.
  • Metadata: Only persist the metadata JSON object to PostgreSQL for routing logic.
  • Validation: Verify the encryptedPayload is a valid encrypted container (e.g., check PGP/AES headers) but treat the content as opaque.

[cite_start]2.2 Stateless Authentication (OAuth2) [cite: 44]

  • Gateway Layer: Implement a centralized API Gateway (e.g., Spring Cloud Gateway / Nginx) that validates JWT signatures (JWKS) from BundID (Citizens) and Authority-IDP (Officials).
  • Context Propagation: Extract the sub (User ID) and scope from the JWT and pass them to downstream microservices via HTTP Headers (e.g., X-User-Id, X-User-Role).

3. Database Schema Recommendations (PostgreSQL)

[cite_start]Map the domain objects [cite: 64] to the following relational schema.

Table: documents

CREATE TABLE documents (
    id UUID PRIMARY KEY,
    reference_number VARCHAR(50) NOT NULL, -- "Aktenzeichen"
    authority_id VARCHAR(50) NOT NULL,     -- Routing target
    status VARCHAR(20) DEFAULT 'RECEIVED', -- RECEIVED, ROUTED, ASSIGNED
    storage_path VARCHAR(255) NOT NULL,    -- S3 Key for encrypted blob
    created_at TIMESTAMPTZ DEFAULT NOW(),
    [cite_start]retention_date TIMESTAMPTZ             -- For GDPR auto-deletion [cite: 14]
);
CREATE INDEX idx_docs_authority ON documents(authority_id, status);

Table: threads

CREATE TABLE threads (
    id UUID PRIMARY KEY,
    document_id UUID REFERENCES documents(id),
    type VARCHAR(20) NOT NULL,             -- CHAT, CALLBACK, APPOINTMENT
    assigned_official_id VARCHAR(100),     -- Nullable until claimed
    last_activity_at TIMESTAMPTZ
);

Table: messages

CREATE TABLE messages (
    id UUID PRIMARY KEY,
    thread_id UUID REFERENCES threads(id),
    sender_role VARCHAR(20) NOT NULL,
    content_blob TEXT NOT NULL,            -- Encrypted content
    created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Efficient Cursor Pagination: Index on (thread_id, created_at)
CREATE INDEX idx_msgs_thread_time ON messages(thread_id, created_at DESC);

4. Performance & Scalability Patterns

[cite_start]4.1 Rate Limiting (Redis Token Bucket) [cite: 24]

To protect against DDoS and ensure fair usage (NFR-2), implement specific limits:

  • Citizens: 10 requests/minute (prevent spamming threads).
  • Officials: 1000 requests/minute (allow rapid batch processing).

Implementation Tip: Use a Redis-based "Sliding Window" Lua script. Key format: rate_limit:{user_id}.

[cite_start]4.2 Caching Strategy [cite: 47]

  • Routing Rules: Cache RoutingRules in Redis for 1 hour. Invalidate immediately on Admin updates.
  • ETags: For GET /documents/{id}, generate an ETag based on the updated_at timestamp. Return 304 Not Modified to save bandwidth if the client has the latest version.

[cite_start]5. Async Export Workflow [cite: 16]

For the POST /exports endpoint:

  1. API Layer: Validate request -> Publish event ExportRequested to RabbitMQ/Redis Stream -> Return 202 Accepted + jobId.
  2. Worker:
    • Consume ExportRequested.
    • Fetch encryptedPayload from S3.
    • Fetch Message History from Postgres.
    • Note: The Worker might need a special "Authority Key" to re-encrypt the package for the target eAkte system, depending on the specific crypto-concept.
    • Push result to the Authority's Ingest Interface.
    • Update Job Status to COMPLETED.

6. Definition of Done Checklist

Before deploying to the staging environment, ensure:

  • [cite_start][ ] Load Test: System handles 500 concurrent document uploads without error[cite: 24].
  • Security Audit: Confirm no PII (Aktenzeichen) is logged in plaintext application logs.
  • [cite_start][ ] Cleanup: The "GDPR Reaper" job is active and deletes documents where retention_date < NOW()[cite: 14].

Implementation Guide: DirektVermittlungDe

Status: Draft v1.1 Stack: Python / FastAPI Context: Implements constraints from DvdArchitektur.txt and ADR-007.

1. Technology Stack

  • Language: Python 3.11+
  • Web Framework: FastAPI (with Uvicorn + Gunicorn)
  • Validation: Pydantic V2 (Strict Mode)
  • Database ORM: SQLAlchemy (Async) or Tortoise-ORM
  • Task Queue: ARQ (Redis-based) or Celery
  • Primary DB: PostgreSQL 15+
  • Blob Store: MinIO / AWS S3

2. Project Structure & Patterns

Organize the monolithic repo or microservices using "Clean Architecture":

/src
  /domain       # Pydantic models & Business Rules (Pure Python)
  /adapters     # DB, S3, External APIs
  /service      # Application Logic
  /api          # FastAPI Routes
  /workers      # Background Job Definitions

2. Project Structure & Patterns

Organize the monolithic repo or microservices using "Clean Architecture":

/src
  /domain       # Pydantic models & Business Rules (Pure Python)
  /adapters     # DB, S3, External APIs
  /service      # Application Logic
  /api          # FastAPI Routes
  /workers      # Background Job Definitions

## 3. The "Hybrid Concurrency" Pattern (Critical)

To meet NFR-1 (<300ms) and NFR-2 (10k sessions), you must not block the Event Loop.

### 3.1 The Rule

- NEVER use time.sleep, requests, or heavy computation (e.g., pypdf, cryptography) inside an async def.
- ALWAYS use await for I/O.
- ALWAYS use loop.run_in_executor for CPU tasks.

### 3.2 Implementation Snippet

```python
import asyncio
from concurrent.futures import ProcessPoolExecutor
from fastapi import APIRouter, UploadFile
import some_heavy_crypto_lib

router = APIRouter()
# Create a dedicated pool for CPU tasks
cpu_pool = ProcessPoolExecutor(max_workers=4)

def cpu_bound_decryption(payload: bytes) -> dict:
    # This runs in a separate process, bypassing the GIL
    return some_heavy_crypto_lib.decrypt_and_parse(payload)

@router.post("/documents")
async def upload_document(file: UploadFile):
    content = await file.read()  # Non-blocking I/O
    
    # Offload CPU work to the pool
    loop = asyncio.get_running_loop()
    metadata = await loop.run_in_executor(
        cpu_pool, 
        cpu_bound_decryption, 
        content
    )
    
    return {"status": "processed", "meta": metadata}

4. Security Implementation

4.1 "Blind" Ingest

  • Stream uploads directly to S3 using aiobotocore to avoid loading 50MB PDFs into RAM.
  • Do not attempt to read the encryptedPayload in the main web service process.

4.2 Auth Middleware

Use fastapi.security.OAuth2AuthorizationCodeBearer. Implement a dependency that validates the JWT signature using a cached JWKS (JSON Web Key Set) to avoid a network call on every request.

5. Database Schema (SQLAlchemy Async)

from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import String, UUID, DateTime
import uuid
from datetime import datetime

class Base(DeclarativeBase):
    pass

class Document(Base):
    __tablename__ = "documents"
    id: Mapped[uuid.UUID] = mapped_column(primary_key=True, default=uuid.uuid4)
    reference_number: Mapped[str] = mapped_column(String(50), index=True)
    status: Mapped[str] = mapped_column(String(20), default="RECEIVED")
    # …

6. Testing Strategy (Agentic TDD)

  • Framework: pytest + pytest-asyncio.
  • Mocking: Use respx for mocking external HTTP calls (Authority Systems).
  • Database: Use testcontainers-python to spin up a real Postgres for integration tests.
  • Prompting the Agent: "Write an async pytest for POST /documents. Use ProcessPoolExecutor mock to verify CPU offloading."

xxx