Files

tegwick 9081cb80d3 chore: setup agentic coding workplan

2025-12-01 22:37:27 +01:00

22 KiB

Raw Permalink Blame History

Workplan: Main Codebase Integration & TDD Setup

Status: Draft v1.0 Date: 2025-12-01 Goal: Consolidate prototype learnings into a production-ready main codebase with TDD practices and proper ADR documentation structure.

1. Current State Assessment

Existing Assets

3 Prototype Implementations: chatgpt5 (most complete), geminiNbt3pro, grok4.1
Documentation:
- Single decisions.md file with all ADRs (needs restructuring)
- implementation_guide.md (comprehensive)
- api_docs.md (scenario-based)
- openapi.yaml (API spec)
Architecture: Clean Architecture pattern established in chatgpt5 prototype

Gaps

No unified main codebase
ADRs not in individual files (not easily referenceable)
No TDD test suite defining core interfaces
No CI/CD pipeline configuration
Prototypes have overlapping implementations without consolidation

2. Target State Definition

Main Codebase Structure

/
├── docs/
│   ├── architecture/
│   │   ├── adr/                    # Individual ADR files
│   │   │   ├── 0001-split-payload-model.md
│   │   │   ├── 0002-stateless-auth.md
│   │   │   └── ...
│   │   ├── architecture-overview.md
│   │   └── design-patterns.md
│   ├── api/
│   │   ├── openapi.yaml
│   │   └── api-scenarios.md
│   └── development/
│       ├── testing-strategy.md
│       └── agentic-coding-guide.md
├── src/
│   ├── domain/          # Pure business logic (TDD tested)
│   ├── adapters/        # External integrations (mocked in tests)
│   ├── service/         # Application services (TDD tested)
│   ├── api/             # FastAPI routes (integration tested)
│   └── workers/         # Background jobs
├── tests/
│   ├── unit/            # TDD unit tests
│   ├── integration/     # Integration tests
│   └── fixtures/        # Test data and mocks
├── scripts/
│   └── init_db.py
├── pyproject.toml       # Dependencies & tool config
├── pytest.ini           # Test configuration
├── .github/
│   └── workflows/       # CI/CD pipelines
└── CLAUDE.md            # AI coding assistant guidance

TDD Approach

Interface-first: Define Pydantic models and service interfaces via tests
Red-Green-Refactor: Write failing test → implement → refactor
Test pyramid: Many unit tests, fewer integration tests, minimal e2e tests
Async testing: Use pytest-asyncio for async operations

ADR Documentation Standards

Format: Follow Markdown Any Decision Records (MADR) template
Naming: NNNN-title-with-dashes.md (e.g., 0001-split-payload-model.md)
Location: /docs/architecture/adr/

Template:

# ADR-NNNN: [Title]

**Status:** [Accepted|Proposed|Deprecated|Superseded]
**Date:** YYYY-MM-DD
**Deciders:** [Team/Role]

## Context and Problem Statement
[What is the issue we're addressing?]

## Decision Drivers
* [Driver 1]
* [Driver 2]

## Considered Options
* Option 1
* Option 2

## Decision Outcome
Chosen option: [option], because [rationale].

### Positive Consequences
* [Consequence 1]

### Negative Consequences
* [Consequence 1]

## Implementation Notes
[Specific technical guidance]

3. Phase 1: Foundation Setup (Week 1)

3.1 ADR Restructuring

Goal: Extract individual ADRs from decisions.md into separate files.

Tasks:

Create /docs/architecture/adr/ directory
Create ADR template file: docs/architecture/adr/0000-template.md
Extract ADR-001 (Split-Payload Model) → 0001-split-payload-model.md
Extract ADR-002 (Stateless Auth) → 0002-stateless-authentication.md
Extract ADR-003 (Pagination) → 0003-cursor-based-pagination.md
Extract ADR-004 (Async Exports) → 0004-async-export-workflow.md
Extract ADR-005 (Resource Naming) → 0005-rest-resource-structure.md
Extract ADR-006 (Data Retention) → 0006-gdpr-retention-model.md
Extract ADR-007 (Python & ProcessPoolExecutor) → 0007-hybrid-concurrency-pattern.md
Create index file: docs/architecture/adr/README.md with ADR list and links
Update decisions.md with deprecation notice and redirect to ADR directory

Acceptance Criteria:

Each ADR is a standalone markdown file
ADRs follow consistent template structure
Index file allows easy navigation
CLAUDE.md updated to reference new ADR location

3.2 Main Codebase Scaffolding

Goal: Create production-ready directory structure with tooling.

Tasks:

Create /src/ directory with Clean Architecture structure
Initialize pyproject.toml with dependencies from prototype-chatgpt5
Add dev dependencies: pytest, pytest-asyncio, pytest-cov, httpx, respx, mypy, ruff
Create pytest.ini with async and coverage configuration
Create .gitignore (Python, IDE, env files)
Create tests/ structure: unit/, integration/, fixtures/
Create scripts/ directory
Set up pre-commit hooks configuration (ruff, mypy)

Acceptance Criteria:

Directory structure matches target state
pip install -e ".[dev]" works
pytest runs (even with 0 tests)
Type checking with mypy src/ works

3.3 Documentation Organization

Goal: Restructure documentation for clarity and AI assistant consumption.

Tasks:

Create /docs/architecture/ directory
Create /docs/api/ directory
Create /docs/development/ directory
Move openapi.yaml → docs/api/openapi.yaml
Move api_docs.md → docs/api/api-scenarios.md
Refactor implementation_guide.md → split into:
- docs/architecture/design-patterns.md (architectural patterns)
- docs/development/testing-strategy.md (TDD approach)
- docs/development/agentic-coding-guide.md (LLM-specific guidance)
Create docs/architecture/architecture-overview.md (high-level system design)
Update CLAUDE.md with new documentation structure

Acceptance Criteria:

Documentation is logically organized by concern
Each document has a single, clear purpose
CLAUDE.md points to correct locations

4. Phase 2: TDD Interface Definition (Week 2)

4.1 Domain Models (TDD)

Goal: Define core domain models with comprehensive test coverage.

Approach: Write tests first, then implement models.

Tasks:

Test: tests/unit/domain/test_document_metadata.py
- Test: Valid metadata creation
- Test: Invalid authorityId (empty, too long)
- Test: Invalid referenceNumber format
- Test: Future issuedAt date validation
- Implement: src/domain/models.py → DocumentMetadata
Test: tests/unit/domain/test_thread_models.py
- Test: ThreadType enum values
- Test: SenderRole enum values
- Test: ThreadCreateRequest validation
- Test: Message model with encrypted content
- Implement: Thread-related models
Test: tests/unit/domain/test_document_envelope.py
- Test: Split payload structure
- Test: encryptedPayload validation (base64, size limits)
- Test: Status transitions (RECEIVED → ROUTED → ASSIGNED → CLOSED)
- Implement: DocumentEnvelope and status management

Acceptance Criteria:

All domain models have >90% test coverage
Pydantic validation catches invalid inputs
Tests run in <1 second
Zero mypy type errors

4.2 Service Layer Interfaces (TDD)

Goal: Define service contracts via protocol/ABC classes with tests.

Tasks:

Test: tests/unit/service/test_documents_service.py
- Mock: Database adapter
- Mock: Storage adapter (S3)
- Mock: Routing engine
- Test: create_document() - happy path
- Test: create_document() - routing failure
- Test: create_document() - storage failure
- Test: get_document() - found
- Test: get_document() - not found
- Test: Retention date calculation (default 90 days)
- Implement: src/service/documents_service.py
Test: tests/unit/service/test_threads_service.py
- Test: create_thread() - links to document
- Test: create_thread() - document not found (404)
- Test: list_messages() - cursor pagination
- Test: add_message() - role validation
- Implement: src/service/threads_service.py
Test: tests/unit/service/test_exports_service.py
- Test: create_export_job() - returns jobId
- Test: get_export_status() - job states (QUEUED, RUNNING, COMPLETED, FAILED)
- Test: Async job enqueuing (mock Redis/ARQ)
- Implement: src/service/exports_service.py

Acceptance Criteria:

Service layer has clear, testable interfaces
All external dependencies are mocked
Tests verify business logic, not infrastructure
Each service method has both success and failure test cases

4.3 Adapter Contracts (Protocols)

Goal: Define adapter interfaces using Python Protocols for mockability.

Tasks:

Create src/adapters/protocols.py:
- StorageAdapter protocol (save_encrypted_payload, get_encrypted_payload)
- DatabaseAdapter protocol (CRUD operations)
- RoutingEngine protocol (route_document)
- JobQueue protocol (enqueue, get_status)
Create stub implementations for testing:
- tests/fixtures/storage_stub.py (in-memory storage)
- tests/fixtures/db_stub.py (in-memory DB)

Acceptance Criteria:

Protocols are narrow and focused
Test fixtures implement all protocols
Production adapters can be swapped without changing service layer

5. Phase 3: Prototype Integration (Week 3)

5.1 Comparative Analysis

Goal: Identify best implementations across prototypes.

Tasks:

Analyze prototype-chatgpt5/src/app/adapters/:
- auth.py - OAuth2 implementation quality
- db.py - SQLAlchemy async patterns
- storage.py - S3 streaming approach
- routing.py - Routing logic structure
Analyze prototype-geminiNbt3pro/:
- Identify unique features or better implementations
Analyze prototype-grok4.1/:
- Compare test coverage and patterns
Document findings in: docs/development/prototype-analysis.md
- Table: Feature vs Prototype vs Recommendation
- Rationale for selections

Acceptance Criteria:

Clear decision on which implementation to use for each component
Documented rationale for selections
Identified any missing features across all prototypes

5.2 Core Adapter Implementation

Goal: Implement production adapters based on best prototype code.

Tasks:

Database Adapter (src/adapters/db.py):
- Port SQLAlchemy models from chosen prototype
- Implement async session management
- Add connection pooling configuration
- Write integration tests: tests/integration/test_db_adapter.py
Storage Adapter (src/adapters/storage.py):
- Implement S3 client using aiobotocore
- Add streaming upload/download (no in-memory buffering)
- Mock S3 in tests using moto or similar
- Write tests: tests/integration/test_storage_adapter.py
Routing Engine (src/adapters/routing.py):
- Port routing logic from prototype
- Make routing rules configurable (not hardcoded)
- Add caching layer (Redis) for routing rules
- Write tests: tests/unit/adapters/test_routing.py
Authentication (src/adapters/auth.py):
- Implement JWT validation
- Add JWKS caching
- Create FastAPI dependency for auth
- Write tests: tests/unit/adapters/test_auth.py

Acceptance Criteria:

All adapters follow Protocol contracts
Integration tests use real dependencies (testcontainers)
Unit tests use mocks
Streaming works for large files (>50MB)

5.3 API Layer Implementation

Goal: Build FastAPI routes with OpenAPI compliance.

Tasks:

Documents API (src/api/documents.py):
- POST /documents - implement with streaming upload
- GET /documents/{id} - implement with ETag support
- Add request validation (Pydantic)
- Write integration tests: tests/integration/test_documents_api.py
Threads API (src/api/threads.py):
- POST /documents/{id}/threads
- GET /threads/{id}/messages (cursor pagination)
- POST /threads/{id}/messages
- Write integration tests: tests/integration/test_threads_api.py
Exports API (src/api/exports.py):
- POST /exports (async job creation)
- GET /exports/{jobId} (status polling)
- Write integration tests: tests/integration/test_exports_api.py
Main App (src/main.py):
- Configure FastAPI with CORS, middleware
- Include all routers
- Add exception handlers
- Add health check endpoint: GET /health

Acceptance Criteria:

OpenAPI schema matches docs/api/openapi.yaml
All endpoints have auth middleware
Integration tests achieve >80% coverage
API responses match documented format

5.4 Background Workers

Goal: Implement async export worker.

Tasks:

Choose task queue: ARQ (Redis-based, async) or Celery
Implement src/workers/exports_worker.py:
- Fetch document from storage
- Fetch message history from DB
- Generate export package (PDF + metadata)
- Update job status
Write worker tests: tests/unit/workers/test_exports_worker.py
Document worker deployment in: docs/development/worker-deployment.md

Acceptance Criteria:

Worker processes export jobs independently
Failures are logged and job marked as FAILED
Worker can be scaled horizontally

6. Phase 4: CI/CD & Quality Gates (Week 4)

6.1 GitHub Actions Workflows

Goal: Automate testing and quality checks.

Tasks:

Create .github/workflows/test.yml:
- Run on: push, pull_request
- Matrix: Python 3.11, 3.12
- Steps: Install deps, run pytest, upload coverage
Create .github/workflows/lint.yml:
- Run ruff linting
- Run mypy type checking
- Check code formatting
Create .github/workflows/integration.yml:
- Spin up PostgreSQL, Redis via services
- Run integration tests with real dependencies
Add status badges to README.md

Acceptance Criteria:

All workflows pass on main branch
Pull requests blocked if tests fail
Coverage report available in PR comments

6.2 Pre-commit Hooks

Goal: Catch issues before commit.

Tasks:

Create .pre-commit-config.yaml:
- ruff linting and formatting
- mypy type checking
- trailing whitespace removal
- YAML validation
Document setup in: docs/development/setup-guide.md

Acceptance Criteria:

Hooks auto-format code
Hooks prevent commits with type errors
Setup documented for new developers

6.3 Test Coverage Requirements

Goal: Enforce quality thresholds.

Tasks:

Configure pytest-cov in pytest.ini:
- Minimum coverage: 80%
- Exclude: tests/, scripts/
Add coverage badge to README.md
Document coverage exemptions (e.g., # pragma: no cover)

Acceptance Criteria:

pytest --cov fails if <80% coverage
Coverage report generated in HTML format
Uncovered lines are intentional and documented

7. Phase 5: Agentic Coding Enablement (Week 5)

7.1 Agentic Coding Guide

Goal: Create comprehensive guide for LLM-driven development.

Tasks:

Create docs/development/agentic-coding-guide.md:
- TDD workflow for Claude/GPT
- Example prompts for generating tests
- How to use Protocol adapters for mocking
- Async testing patterns
- Common pitfalls (GIL, blocking operations)
Add example prompt templates:
- "Write async pytest for POST /documents with ProcessPoolExecutor mock"
- "Implement cursor pagination for messages following ADR-003"
Update CLAUDE.md with agentic coding patterns

Acceptance Criteria:

Guide includes concrete examples
Prompts reference specific ADRs
Guide covers both unit and integration test generation

7.2 Testing Utilities & Fixtures

Goal: Provide reusable test infrastructure.

Tasks:

Create tests/fixtures/factories.py:
- DocumentMetadataFactory (faker-based)
- ThreadFactory
- MessageFactory
Create tests/fixtures/db_fixtures.py:
- @pytest.fixture for async DB session
- @pytest.fixture for testcontainers Postgres
Create tests/fixtures/auth_fixtures.py:
- Mock JWT tokens with different scopes
- Mock JWKS endpoints
Document in: docs/development/testing-utilities.md

Acceptance Criteria:

Fixtures reduce boilerplate in tests
Factories generate realistic test data
Documentation shows usage examples

7.3 ADR for Agentic Development

Goal: Document the TDD + AI approach as architectural decision.

Tasks:

Create docs/architecture/adr/0008-agentic-tdd-workflow.md:
- Context: LLM-driven development velocity vs. quality
- Decision: Interface-first TDD with AI assistance
- Rationale: Tests serve as executable specification
- Implementation: Workflow, tooling, prompts

Acceptance Criteria:

ADR approved by team
Links to agentic-coding-guide.md
Referenced in CLAUDE.md

8. Phase 6: Migration & Validation (Week 6)

8.1 Prototype Deprecation

Goal: Mark prototypes as archived.

Tasks:

Add README.md to each prototype directory:
- Status: ARCHIVED
- Reason: Consolidated into main codebase
- Date: 2025-12-XX
Document migration decisions in: docs/development/prototype-migration.md
Keep prototypes in repo for reference (don't delete)

Acceptance Criteria:

Clear indication that prototypes are not maintained
Migration rationale documented

8.2 End-to-End Validation

Goal: Verify complete system integration.

Tasks:

Write E2E test: tests/e2e/test_full_workflow.py:
- Citizen uploads document
- Document is routed
- Thread is created
- Messages are exchanged
- Export is generated
Run against local environment (Docker Compose)
Measure performance against NFRs:
- Document upload + routing: <500ms
- Message retrieval: <300ms
Document E2E setup in: docs/development/e2e-testing.md

Acceptance Criteria:

E2E test passes consistently
Performance targets met
E2E environment reproducible via Docker Compose

8.3 Documentation Review

Goal: Ensure all documentation is accurate and complete.

Tasks:

Review all ADRs for consistency
Update CLAUDE.md with final structure
Review API documentation against implementation
Spell check and grammar check all docs
Generate API documentation from OpenAPI spec (ReDoc or Swagger UI)

Acceptance Criteria:

No broken links in documentation
Code examples in docs are tested
CLAUDE.md accurately reflects current state

9. Success Criteria (Overall)

Functional Requirements

✅ Main codebase has all features from best prototype
✅ All core APIs implemented and tested
✅ Background worker functional

Quality Requirements

✅ Test coverage >80%
✅ Zero mypy type errors
✅ All linting rules pass
✅ CI/CD pipeline green

Documentation Requirements

✅ ADRs in individual files with consistent structure
✅ All major decisions documented
✅ Agentic coding guide comprehensive
✅ CLAUDE.md accurate and complete

Performance Requirements

✅ Document routing <500ms (measured in E2E tests)
✅ Message retrieval <300ms (measured in E2E tests)
✅ Large file upload streaming works (>50MB test)

Process Requirements

✅ TDD workflow established and documented
✅ Pre-commit hooks prevent quality issues
✅ GitHub Actions enforce quality gates
✅ Agentic development patterns proven with at least 3 features

10. Risk Mitigation

Risk 1: Prototype Integration Conflicts

Mitigation: Complete comparative analysis (Phase 3.1) before implementation. Document decision rationale.

Risk 2: TDD Slowing Initial Progress

Mitigation: Front-load interface definition (Phase 2). Once interfaces stable, implementation accelerates.

Risk 3: Incomplete ADR Extraction

Mitigation: Use checklist approach. Review original decisions.md multiple times. Cross-reference with implementation guide.

Risk 4: Agentic Coding Learning Curve

Mitigation: Create example-driven guide. Include actual prompts that worked. Pair with human for first few features.

Risk 5: Performance Targets Not Met

Mitigation: Include performance testing from Phase 2 onwards. Identify bottlenecks early. Profile with py-spy or similar.

11. Next Steps

Review this workplan with the team
Adjust timeline based on team capacity
Start Phase 1 (Foundation Setup)
Daily standup to track progress and blockers
Weekly retrospective to improve agentic coding workflow

Appendix A: Recommended Tools

Testing: pytest, pytest-asyncio, pytest-cov, hypothesis (property testing)
Mocking: respx (HTTP), moto (AWS), testcontainers-python (real services)
Linting: ruff (fast, replaces flake8 + isort + pyupgrade)
Type Checking: mypy with strict mode
Factories: factory_boy or custom Pydantic factories
Performance: py-spy (profiling), locust (load testing)
Pre-commit: pre-commit framework
CI/CD: GitHub Actions (free for public repos)

Appendix B: ADR Numbering Convention

0001-0099: Core Architecture (payload model, auth, concurrency)
0100-0199: Data & Persistence (pagination, retention, schema)
0200-0299: API & Integration (REST structure, export workflow)
0300-0399: Development Process (TDD, agentic coding)
0400+: Future decisions

Document Owner: Backend Engineering Team Last Updated: 2025-12-01 Status: Ready for Review

22 KiB Raw Permalink Blame History

Workplan: Main Codebase Integration & TDD Setup

1. Current State Assessment

Existing Assets

Gaps

2. Target State Definition

Main Codebase Structure

TDD Approach

ADR Documentation Standards

3. Phase 1: Foundation Setup (Week 1)

3.1 ADR Restructuring

3.2 Main Codebase Scaffolding

3.3 Documentation Organization

4. Phase 2: TDD Interface Definition (Week 2)

4.1 Domain Models (TDD)

4.2 Service Layer Interfaces (TDD)

4.3 Adapter Contracts (Protocols)

5. Phase 3: Prototype Integration (Week 3)

5.1 Comparative Analysis

5.2 Core Adapter Implementation

5.3 API Layer Implementation

5.4 Background Workers

6. Phase 4: CI/CD & Quality Gates (Week 4)

6.1 GitHub Actions Workflows

6.2 Pre-commit Hooks

6.3 Test Coverage Requirements

7. Phase 5: Agentic Coding Enablement (Week 5)

7.1 Agentic Coding Guide

7.2 Testing Utilities & Fixtures

7.3 ADR for Agentic Development

8. Phase 6: Migration & Validation (Week 6)

8.1 Prototype Deprecation

8.2 End-to-End Validation

8.3 Documentation Review

9. Success Criteria (Overall)

Functional Requirements

Quality Requirements

Documentation Requirements

Performance Requirements

Process Requirements

10. Risk Mitigation

Risk 1: Prototype Integration Conflicts

Risk 2: TDD Slowing Initial Progress

Risk 3: Incomplete ADR Extraction

Risk 4: Agentic Coding Learning Curve

Risk 5: Performance Targets Not Met

11. Next Steps

Appendix A: Recommended Tools

Appendix B: ADR Numbering Convention

22 KiB

Raw Permalink Blame History