# Workplan: Main Codebase Integration & TDD Setup

**Status:** Draft v1.0
**Date:** 2025-12-01
**Goal:** Consolidate prototype learnings into a production-ready main codebase with TDD practices and proper ADR documentation structure.

---

## 1. Current State Assessment

### Existing Assets
- **3 Prototype Implementations**: chatgpt5 (most complete), geminiNbt3pro, grok4.1
- **Documentation**:
  - Single `decisions.md` file with all ADRs (needs restructuring)
  - `implementation_guide.md` (comprehensive)
  - `api_docs.md` (scenario-based)
  - `openapi.yaml` (API spec)
- **Architecture**: Clean Architecture pattern established in chatgpt5 prototype

### Gaps
- No unified main codebase
- ADRs not in individual files (not easily referenceable)
- No TDD test suite defining core interfaces
- No CI/CD pipeline configuration
- Prototypes have overlapping implementations without consolidation

---

## 2. Target State Definition

### Main Codebase Structure
```
/
├── docs/
│   ├── architecture/
│   │   ├── adr/                    # Individual ADR files
│   │   │   ├── 0001-split-payload-model.md
│   │   │   ├── 0002-stateless-auth.md
│   │   │   └── ...
│   │   ├── architecture-overview.md
│   │   └── design-patterns.md
│   ├── api/
│   │   ├── openapi.yaml
│   │   └── api-scenarios.md
│   └── development/
│       ├── testing-strategy.md
│       └── agentic-coding-guide.md
├── src/
│   ├── domain/          # Pure business logic (TDD tested)
│   ├── adapters/        # External integrations (mocked in tests)
│   ├── service/         # Application services (TDD tested)
│   ├── api/             # FastAPI routes (integration tested)
│   └── workers/         # Background jobs
├── tests/
│   ├── unit/            # TDD unit tests
│   ├── integration/     # Integration tests
│   └── fixtures/        # Test data and mocks
├── scripts/
│   └── init_db.py
├── pyproject.toml       # Dependencies & tool config
├── pytest.ini           # Test configuration
├── .github/
│   └── workflows/       # CI/CD pipelines
└── CLAUDE.md            # AI coding assistant guidance
```

### TDD Approach
- **Interface-first**: Define Pydantic models and service interfaces via tests
- **Red-Green-Refactor**: Write failing test → implement → refactor
- **Test pyramid**: Many unit tests, fewer integration tests, minimal e2e tests
- **Async testing**: Use pytest-asyncio for async operations

### ADR Documentation Standards
- **Format**: Follow Markdown Any Decision Records (MADR) template
- **Naming**: `NNNN-title-with-dashes.md` (e.g., `0001-split-payload-model.md`)
- **Location**: `/docs/architecture/adr/`
- **Template**:
  ```markdown
  # ADR-NNNN: [Title]

  **Status:** [Accepted|Proposed|Deprecated|Superseded]
  **Date:** YYYY-MM-DD
  **Deciders:** [Team/Role]

  ## Context and Problem Statement
  [What is the issue we're addressing?]

  ## Decision Drivers
  * [Driver 1]
  * [Driver 2]

  ## Considered Options
  * Option 1
  * Option 2

  ## Decision Outcome
  Chosen option: [option], because [rationale].

  ### Positive Consequences
  * [Consequence 1]

  ### Negative Consequences
  * [Consequence 1]

  ## Implementation Notes
  [Specific technical guidance]
  ```

---

## 3. Phase 1: Foundation Setup (Week 1)

### 3.1 ADR Restructuring
**Goal**: Extract individual ADRs from `decisions.md` into separate files.

**Tasks**:
- [ ] Create `/docs/architecture/adr/` directory
- [ ] Create ADR template file: `docs/architecture/adr/0000-template.md`
- [ ] Extract ADR-001 (Split-Payload Model) → `0001-split-payload-model.md`
- [ ] Extract ADR-002 (Stateless Auth) → `0002-stateless-authentication.md`
- [ ] Extract ADR-003 (Pagination) → `0003-cursor-based-pagination.md`
- [ ] Extract ADR-004 (Async Exports) → `0004-async-export-workflow.md`
- [ ] Extract ADR-005 (Resource Naming) → `0005-rest-resource-structure.md`
- [ ] Extract ADR-006 (Data Retention) → `0006-gdpr-retention-model.md`
- [ ] Extract ADR-007 (Python & ProcessPoolExecutor) → `0007-hybrid-concurrency-pattern.md`
- [ ] Create index file: `docs/architecture/adr/README.md` with ADR list and links
- [ ] Update `decisions.md` with deprecation notice and redirect to ADR directory

**Acceptance Criteria**:
- Each ADR is a standalone markdown file
- ADRs follow consistent template structure
- Index file allows easy navigation
- CLAUDE.md updated to reference new ADR location

### 3.2 Main Codebase Scaffolding
**Goal**: Create production-ready directory structure with tooling.

**Tasks**:
- [ ] Create `/src/` directory with Clean Architecture structure
- [ ] Initialize `pyproject.toml` with dependencies from prototype-chatgpt5
- [ ] Add dev dependencies: pytest, pytest-asyncio, pytest-cov, httpx, respx, mypy, ruff
- [ ] Create `pytest.ini` with async and coverage configuration
- [ ] Create `.gitignore` (Python, IDE, env files)
- [ ] Create `tests/` structure: unit/, integration/, fixtures/
- [ ] Create `scripts/` directory
- [ ] Set up pre-commit hooks configuration (ruff, mypy)

**Acceptance Criteria**:
- Directory structure matches target state
- `pip install -e ".[dev]"` works
- `pytest` runs (even with 0 tests)
- Type checking with `mypy src/` works

### 3.3 Documentation Organization
**Goal**: Restructure documentation for clarity and AI assistant consumption.

**Tasks**:
- [ ] Create `/docs/architecture/` directory
- [ ] Create `/docs/api/` directory
- [ ] Create `/docs/development/` directory
- [ ] Move `openapi.yaml` → `docs/api/openapi.yaml`
- [ ] Move `api_docs.md` → `docs/api/api-scenarios.md`
- [ ] Refactor `implementation_guide.md` → split into:
  - `docs/architecture/design-patterns.md` (architectural patterns)
  - `docs/development/testing-strategy.md` (TDD approach)
  - `docs/development/agentic-coding-guide.md` (LLM-specific guidance)
- [ ] Create `docs/architecture/architecture-overview.md` (high-level system design)
- [ ] Update CLAUDE.md with new documentation structure

**Acceptance Criteria**:
- Documentation is logically organized by concern
- Each document has a single, clear purpose
- CLAUDE.md points to correct locations

---

## 4. Phase 2: TDD Interface Definition (Week 2)

### 4.1 Domain Models (TDD)
**Goal**: Define core domain models with comprehensive test coverage.

**Approach**: Write tests first, then implement models.

**Tasks**:
- [ ] **Test**: `tests/unit/domain/test_document_metadata.py`
  - Test: Valid metadata creation
  - Test: Invalid authorityId (empty, too long)
  - Test: Invalid referenceNumber format
  - Test: Future issuedAt date validation
  - Implement: `src/domain/models.py` → `DocumentMetadata`

- [ ] **Test**: `tests/unit/domain/test_thread_models.py`
  - Test: ThreadType enum values
  - Test: SenderRole enum values
  - Test: ThreadCreateRequest validation
  - Test: Message model with encrypted content
  - Implement: Thread-related models

- [ ] **Test**: `tests/unit/domain/test_document_envelope.py`
  - Test: Split payload structure
  - Test: encryptedPayload validation (base64, size limits)
  - Test: Status transitions (RECEIVED → ROUTED → ASSIGNED → CLOSED)
  - Implement: DocumentEnvelope and status management

**Acceptance Criteria**:
- All domain models have >90% test coverage
- Pydantic validation catches invalid inputs
- Tests run in <1 second
- Zero mypy type errors

### 4.2 Service Layer Interfaces (TDD)
**Goal**: Define service contracts via protocol/ABC classes with tests.

**Tasks**:
- [ ] **Test**: `tests/unit/service/test_documents_service.py`
  - Mock: Database adapter
  - Mock: Storage adapter (S3)
  - Mock: Routing engine
  - Test: create_document() - happy path
  - Test: create_document() - routing failure
  - Test: create_document() - storage failure
  - Test: get_document() - found
  - Test: get_document() - not found
  - Test: Retention date calculation (default 90 days)
  - Implement: `src/service/documents_service.py`

- [ ] **Test**: `tests/unit/service/test_threads_service.py`
  - Test: create_thread() - links to document
  - Test: create_thread() - document not found (404)
  - Test: list_messages() - cursor pagination
  - Test: add_message() - role validation
  - Implement: `src/service/threads_service.py`

- [ ] **Test**: `tests/unit/service/test_exports_service.py`
  - Test: create_export_job() - returns jobId
  - Test: get_export_status() - job states (QUEUED, RUNNING, COMPLETED, FAILED)
  - Test: Async job enqueuing (mock Redis/ARQ)
  - Implement: `src/service/exports_service.py`

**Acceptance Criteria**:
- Service layer has clear, testable interfaces
- All external dependencies are mocked
- Tests verify business logic, not infrastructure
- Each service method has both success and failure test cases

### 4.3 Adapter Contracts (Protocols)
**Goal**: Define adapter interfaces using Python Protocols for mockability.

**Tasks**:
- [ ] Create `src/adapters/protocols.py`:
  - `StorageAdapter` protocol (save_encrypted_payload, get_encrypted_payload)
  - `DatabaseAdapter` protocol (CRUD operations)
  - `RoutingEngine` protocol (route_document)
  - `JobQueue` protocol (enqueue, get_status)

- [ ] Create stub implementations for testing:
  - `tests/fixtures/storage_stub.py` (in-memory storage)
  - `tests/fixtures/db_stub.py` (in-memory DB)

**Acceptance Criteria**:
- Protocols are narrow and focused
- Test fixtures implement all protocols
- Production adapters can be swapped without changing service layer

---

## 5. Phase 3: Prototype Integration (Week 3)

### 5.1 Comparative Analysis
**Goal**: Identify best implementations across prototypes.

**Tasks**:
- [ ] Analyze `prototype-chatgpt5/src/app/adapters/`:
  - auth.py - OAuth2 implementation quality
  - db.py - SQLAlchemy async patterns
  - storage.py - S3 streaming approach
  - routing.py - Routing logic structure

- [ ] Analyze `prototype-geminiNbt3pro/`:
  - Identify unique features or better implementations

- [ ] Analyze `prototype-grok4.1/`:
  - Compare test coverage and patterns

- [ ] Document findings in: `docs/development/prototype-analysis.md`
  - Table: Feature vs Prototype vs Recommendation
  - Rationale for selections

**Acceptance Criteria**:
- Clear decision on which implementation to use for each component
- Documented rationale for selections
- Identified any missing features across all prototypes

### 5.2 Core Adapter Implementation
**Goal**: Implement production adapters based on best prototype code.

**Tasks**:
- [ ] **Database Adapter** (`src/adapters/db.py`):
  - Port SQLAlchemy models from chosen prototype
  - Implement async session management
  - Add connection pooling configuration
  - Write integration tests: `tests/integration/test_db_adapter.py`

- [ ] **Storage Adapter** (`src/adapters/storage.py`):
  - Implement S3 client using aiobotocore
  - Add streaming upload/download (no in-memory buffering)
  - Mock S3 in tests using moto or similar
  - Write tests: `tests/integration/test_storage_adapter.py`

- [ ] **Routing Engine** (`src/adapters/routing.py`):
  - Port routing logic from prototype
  - Make routing rules configurable (not hardcoded)
  - Add caching layer (Redis) for routing rules
  - Write tests: `tests/unit/adapters/test_routing.py`

- [ ] **Authentication** (`src/adapters/auth.py`):
  - Implement JWT validation
  - Add JWKS caching
  - Create FastAPI dependency for auth
  - Write tests: `tests/unit/adapters/test_auth.py`

**Acceptance Criteria**:
- All adapters follow Protocol contracts
- Integration tests use real dependencies (testcontainers)
- Unit tests use mocks
- Streaming works for large files (>50MB)

### 5.3 API Layer Implementation
**Goal**: Build FastAPI routes with OpenAPI compliance.

**Tasks**:
- [ ] **Documents API** (`src/api/documents.py`):
  - POST /documents - implement with streaming upload
  - GET /documents/{id} - implement with ETag support
  - Add request validation (Pydantic)
  - Write integration tests: `tests/integration/test_documents_api.py`

- [ ] **Threads API** (`src/api/threads.py`):
  - POST /documents/{id}/threads
  - GET /threads/{id}/messages (cursor pagination)
  - POST /threads/{id}/messages
  - Write integration tests: `tests/integration/test_threads_api.py`

- [ ] **Exports API** (`src/api/exports.py`):
  - POST /exports (async job creation)
  - GET /exports/{jobId} (status polling)
  - Write integration tests: `tests/integration/test_exports_api.py`

- [ ] **Main App** (`src/main.py`):
  - Configure FastAPI with CORS, middleware
  - Include all routers
  - Add exception handlers
  - Add health check endpoint: GET /health

**Acceptance Criteria**:
- OpenAPI schema matches `docs/api/openapi.yaml`
- All endpoints have auth middleware
- Integration tests achieve >80% coverage
- API responses match documented format

### 5.4 Background Workers
**Goal**: Implement async export worker.

**Tasks**:
- [ ] Choose task queue: ARQ (Redis-based, async) or Celery
- [ ] Implement `src/workers/exports_worker.py`:
  - Fetch document from storage
  - Fetch message history from DB
  - Generate export package (PDF + metadata)
  - Update job status

- [ ] Write worker tests: `tests/unit/workers/test_exports_worker.py`
- [ ] Document worker deployment in: `docs/development/worker-deployment.md`

**Acceptance Criteria**:
- Worker processes export jobs independently
- Failures are logged and job marked as FAILED
- Worker can be scaled horizontally

---

## 6. Phase 4: CI/CD & Quality Gates (Week 4)

### 6.1 GitHub Actions Workflows
**Goal**: Automate testing and quality checks.

**Tasks**:
- [ ] Create `.github/workflows/test.yml`:
  - Run on: push, pull_request
  - Matrix: Python 3.11, 3.12
  - Steps: Install deps, run pytest, upload coverage

- [ ] Create `.github/workflows/lint.yml`:
  - Run ruff linting
  - Run mypy type checking
  - Check code formatting

- [ ] Create `.github/workflows/integration.yml`:
  - Spin up PostgreSQL, Redis via services
  - Run integration tests with real dependencies

- [ ] Add status badges to README.md

**Acceptance Criteria**:
- All workflows pass on main branch
- Pull requests blocked if tests fail
- Coverage report available in PR comments

### 6.2 Pre-commit Hooks
**Goal**: Catch issues before commit.

**Tasks**:
- [ ] Create `.pre-commit-config.yaml`:
  - ruff linting and formatting
  - mypy type checking
  - trailing whitespace removal
  - YAML validation

- [ ] Document setup in: `docs/development/setup-guide.md`

**Acceptance Criteria**:
- Hooks auto-format code
- Hooks prevent commits with type errors
- Setup documented for new developers

### 6.3 Test Coverage Requirements
**Goal**: Enforce quality thresholds.

**Tasks**:
- [ ] Configure pytest-cov in `pytest.ini`:
  - Minimum coverage: 80%
  - Exclude: tests/, scripts/

- [ ] Add coverage badge to README.md
- [ ] Document coverage exemptions (e.g., `# pragma: no cover`)

**Acceptance Criteria**:
- `pytest --cov` fails if <80% coverage
- Coverage report generated in HTML format
- Uncovered lines are intentional and documented

---

## 7. Phase 5: Agentic Coding Enablement (Week 5)

### 7.1 Agentic Coding Guide
**Goal**: Create comprehensive guide for LLM-driven development.

**Tasks**:
- [ ] Create `docs/development/agentic-coding-guide.md`:
  - TDD workflow for Claude/GPT
  - Example prompts for generating tests
  - How to use Protocol adapters for mocking
  - Async testing patterns
  - Common pitfalls (GIL, blocking operations)

- [ ] Add example prompt templates:
  - "Write async pytest for POST /documents with ProcessPoolExecutor mock"
  - "Implement cursor pagination for messages following ADR-003"

- [ ] Update CLAUDE.md with agentic coding patterns

**Acceptance Criteria**:
- Guide includes concrete examples
- Prompts reference specific ADRs
- Guide covers both unit and integration test generation

### 7.2 Testing Utilities & Fixtures
**Goal**: Provide reusable test infrastructure.

**Tasks**:
- [ ] Create `tests/fixtures/factories.py`:
  - DocumentMetadataFactory (faker-based)
  - ThreadFactory
  - MessageFactory

- [ ] Create `tests/fixtures/db_fixtures.py`:
  - @pytest.fixture for async DB session
  - @pytest.fixture for testcontainers Postgres

- [ ] Create `tests/fixtures/auth_fixtures.py`:
  - Mock JWT tokens with different scopes
  - Mock JWKS endpoints

- [ ] Document in: `docs/development/testing-utilities.md`

**Acceptance Criteria**:
- Fixtures reduce boilerplate in tests
- Factories generate realistic test data
- Documentation shows usage examples

### 7.3 ADR for Agentic Development
**Goal**: Document the TDD + AI approach as architectural decision.

**Tasks**:
- [ ] Create `docs/architecture/adr/0008-agentic-tdd-workflow.md`:
  - Context: LLM-driven development velocity vs. quality
  - Decision: Interface-first TDD with AI assistance
  - Rationale: Tests serve as executable specification
  - Implementation: Workflow, tooling, prompts

**Acceptance Criteria**:
- ADR approved by team
- Links to agentic-coding-guide.md
- Referenced in CLAUDE.md

---

## 8. Phase 6: Migration & Validation (Week 6)

### 8.1 Prototype Deprecation
**Goal**: Mark prototypes as archived.

**Tasks**:
- [ ] Add README.md to each prototype directory:
  - Status: ARCHIVED
  - Reason: Consolidated into main codebase
  - Date: 2025-12-XX

- [ ] Document migration decisions in: `docs/development/prototype-migration.md`
- [ ] Keep prototypes in repo for reference (don't delete)

**Acceptance Criteria**:
- Clear indication that prototypes are not maintained
- Migration rationale documented

### 8.2 End-to-End Validation
**Goal**: Verify complete system integration.

**Tasks**:
- [ ] Write E2E test: `tests/e2e/test_full_workflow.py`:
  - Citizen uploads document
  - Document is routed
  - Thread is created
  - Messages are exchanged
  - Export is generated

- [ ] Run against local environment (Docker Compose)
- [ ] Measure performance against NFRs:
  - Document upload + routing: <500ms
  - Message retrieval: <300ms

- [ ] Document E2E setup in: `docs/development/e2e-testing.md`

**Acceptance Criteria**:
- E2E test passes consistently
- Performance targets met
- E2E environment reproducible via Docker Compose

### 8.3 Documentation Review
**Goal**: Ensure all documentation is accurate and complete.

**Tasks**:
- [ ] Review all ADRs for consistency
- [ ] Update CLAUDE.md with final structure
- [ ] Review API documentation against implementation
- [ ] Spell check and grammar check all docs
- [ ] Generate API documentation from OpenAPI spec (ReDoc or Swagger UI)

**Acceptance Criteria**:
- No broken links in documentation
- Code examples in docs are tested
- CLAUDE.md accurately reflects current state

---

## 9. Success Criteria (Overall)

### Functional Requirements
- ✅ Main codebase has all features from best prototype
- ✅ All core APIs implemented and tested
- ✅ Background worker functional

### Quality Requirements
- ✅ Test coverage >80%
- ✅ Zero mypy type errors
- ✅ All linting rules pass
- ✅ CI/CD pipeline green

### Documentation Requirements
- ✅ ADRs in individual files with consistent structure
- ✅ All major decisions documented
- ✅ Agentic coding guide comprehensive
- ✅ CLAUDE.md accurate and complete

### Performance Requirements
- ✅ Document routing <500ms (measured in E2E tests)
- ✅ Message retrieval <300ms (measured in E2E tests)
- ✅ Large file upload streaming works (>50MB test)

### Process Requirements
- ✅ TDD workflow established and documented
- ✅ Pre-commit hooks prevent quality issues
- ✅ GitHub Actions enforce quality gates
- ✅ Agentic development patterns proven with at least 3 features

---

## 10. Risk Mitigation

### Risk 1: Prototype Integration Conflicts
**Mitigation**: Complete comparative analysis (Phase 3.1) before implementation. Document decision rationale.

### Risk 2: TDD Slowing Initial Progress
**Mitigation**: Front-load interface definition (Phase 2). Once interfaces stable, implementation accelerates.

### Risk 3: Incomplete ADR Extraction
**Mitigation**: Use checklist approach. Review original `decisions.md` multiple times. Cross-reference with implementation guide.

### Risk 4: Agentic Coding Learning Curve
**Mitigation**: Create example-driven guide. Include actual prompts that worked. Pair with human for first few features.

### Risk 5: Performance Targets Not Met
**Mitigation**: Include performance testing from Phase 2 onwards. Identify bottlenecks early. Profile with py-spy or similar.

---

## 11. Next Steps

1. **Review this workplan** with the team
2. **Adjust timeline** based on team capacity
3. **Start Phase 1** (Foundation Setup)
4. **Daily standup** to track progress and blockers
5. **Weekly retrospective** to improve agentic coding workflow

---

## Appendix A: Recommended Tools

- **Testing**: pytest, pytest-asyncio, pytest-cov, hypothesis (property testing)
- **Mocking**: respx (HTTP), moto (AWS), testcontainers-python (real services)
- **Linting**: ruff (fast, replaces flake8 + isort + pyupgrade)
- **Type Checking**: mypy with strict mode
- **Factories**: factory_boy or custom Pydantic factories
- **Performance**: py-spy (profiling), locust (load testing)
- **Pre-commit**: pre-commit framework
- **CI/CD**: GitHub Actions (free for public repos)

---

## Appendix B: ADR Numbering Convention

- **0001-0099**: Core Architecture (payload model, auth, concurrency)
- **0100-0199**: Data & Persistence (pagination, retention, schema)
- **0200-0299**: API & Integration (REST structure, export workflow)
- **0300-0399**: Development Process (TDD, agentic coding)
- **0400+**: Future decisions

---

**Document Owner**: Backend Engineering Team
**Last Updated**: 2025-12-01
**Status**: Ready for Review