1
ComposableRepositoryParadigm
Bernd Worsch edited this page 2025-10-04 22:56:34 +00:00

Composable-Repository Paradigm Specification

Overview

The composable-repository paradigm is a software development approach designed to facilitate incremental modularity in projects, particularly for Python applications. It addresses the common challenge where developers identify reusable "capabilities" (self-contained functionalities including code, tests, documentation, and configurations) during active development but face friction in extracting them into independent repositories. By organizing these capabilities as subdirectories within the main repository—mirroring the main repo's structure—this paradigm enables gradual separation without disrupting the core project. The goal is to maintain a compact, cohesive codebase while preparing for reusability, promoting principles like separation of concerns, DRY (Don't Repeat Yourself), and eventual library extraction.

This specification is tailored for Python projects, leveraging ecosystem tools like Poetry, setuptools, or Hatch for dependency management and packaging. It assumes a monorepo-like starting point where the main project evolves, and capabilities are nurtured until ready for independence.

Key Concepts

  • Capability: A modular unit of functionality that could stand alone as a library or package. Examples include a custom logging handler, a data validation module, or a utility for API interactions. Each capability includes:
    • Source code (e.g., Python modules/classes).
    • Tests (e.g., using pytest).
    • Documentation (e.g., README.md, docstrings).
    • Configurations (e.g., pyproject.toml snippets, requirements files).
  • Subdirectory as Proto-Repository: Each capability lives in a dedicated subdirectory (e.g., /capabilities/my_capability) that mimics the main repository's structure. This ensures consistency and eases eventual extraction.
  • Incremental Separation: Capabilities start intertwined with the core but are refactored over time to reduce dependencies, checked via tools, until they can be split into standalone repositories (e.g., via Git subtree split).

Repository Structure

The main repository should follow a standard Python project layout, with capabilities nested under a top-level directory like /capabilities. This keeps them visible but segregated.

Example structure for a main repo named my_project:

my_project/
├── pyproject.toml          # Main project config (Poetry/Hatch/setuptools)
├── README.md               # Overall docs
├── src/                    # Main source code
│   └── my_project/
│       ├── __init__.py
│       └── core_module.py
├── tests/                  # Main tests
│   └── test_core.py
├── capabilities/           # Container for proto-repos
│   ├── capability_a/       # Example capability (e.g., a reusable auth module)
│   │   ├── pyproject.toml  # Capability-specific config (for editable installs)
│   │   ├── README.md       # Docs for this capability
│   │   ├── src/
│   │   │   └── capability_a/
│   │   │       ├── __init__.py
│   │   │       └── auth.py
│   │   └── tests/
│   │       └── test_auth.py
│   └── capability_b/       # Another capability (e.g., data utils)
│       ├── pyproject.toml
│       ├── README.md
│       ├── src/
│       │   └── capability_b/
│       │       ├── __init__.py
│       │       └── utils.py
│       └── tests/
│           └── test_utils.py
└── setup.cfg               # Optional, if using setuptools
  • Consistency Rule: Each capability's subdirectory must replicate the main repo's conventions, e.g.:
    • Use src/ layout for PEP 660 compliance.
    • Same testing framework (e.g., pytest with consistent fixtures).
    • Uniform docstring styles (e.g., Google or NumPy format).
    • Shared linting rules (e.g., via a root .pre-commit-config.yaml).
  • Shared Resources: Place common utilities (e.g., a base test class) in the main repo initially, but refactor them into capabilities if they prove reusable.

Dependency Guidelines

Dependencies are critical to ensure capabilities can be extracted without breakage. The core principle is unidirectional dependency flow: Capabilities may depend on each other or external libraries, but never on the parent (main) project. This prevents tight coupling and circular dependencies.

  • Allowed Dependencies:

    • Capabilities can depend on external PyPI packages (e.g., requests for an API capability).
    • Inter-capability dependencies are permitted if acyclic (e.g., capability_a depends on capability_b, but not vice versa). Use tools like dephell deps graph or pipdeptree to visualize.
    • Use editable installs for local development: In the main pyproject.toml, add capabilities as dev dependencies like capability_a = { path = "./capabilities/capability_a", develop = true } (Poetry syntax).
  • Prohibited Dependencies:

    • No imports from the main project's modules into a capability (e.g., avoid from my_project.core_module import something in capability_a).
    • Avoid implicit dependencies like shared global configs; duplicate if needed, then refactor.
  • Incremental Dependency Checks:

    • Static Analysis: Use pylint or flake8 with plugins like flake8-import-order to enforce import rules. Configure a custom rule to flag imports from ..my_project.
    • Runtime Checks: Write a script (e.g., in /scripts/check_deps.py) using importlib to scan modules and assert no parent imports:
      import importlib
      import os
      from pathlib import Path
      
      def check_no_parent_imports(capability_dir: Path):
          parent_module = 'my_project'  # Adjust to your main module
          for root, _, files in os.walk(capability_dir / 'src'):
              for file in files:
                  if file.endswith('.py'):
                      module_path = (Path(root) / file).relative_to(capability_dir)
                      module_name = str(module_path).replace('/', '.').rstrip('.py')
                      mod = importlib.import_module(module_name)
                      for name in dir(mod):
                          attr = getattr(mod, name)
                          if hasattr(attr, '__module__') and parent_module in attr.__module__:
                              raise ValueError(f"Invalid import from parent in {module_name}")
      
      Run this via pre-commit hooks or CI.
    • Refactoring Process: Start with loose coupling—use interfaces/abstract base classes (from abc) for interactions. Gradually replace parent dependencies with capability-internal implementations or new sub-capabilities.
    • Tooling Integration: Use Poetry's workspace feature (via poetry workspace) or Hatch's environments to manage multi-package builds. For checks, integrate safety for vulnerability scans and bandit for security issues across all subdirs.

Extraction Process

To promote a capability to its own repository:

  1. Maturity Check: Ensure 80%+ test coverage (via pytest --cov), full docs, and no parent dependencies (via checks above).
  2. Git Split: Use git subtree split -P capabilities/capability_a -b capability_a_branch to create a branch with only that subdir's history.
  3. New Repo Setup: Push the branch to a new Git repo, add a top-level pyproject.toml, and publish to PyPI if reusable.
  4. Update Main Repo: Replace the subdir with a dependency (e.g., pip install git+https://github.com/user/capability_a.git or from PyPI). Remove the subdir and commit.
  5. Automation: Script the process with a tool like cookiecutter for templating new capabilities.

Best Practices for Python Implementation

  • Versioning: Use semantic versioning in each capability's pyproject.toml. Start with 0.1.0-dev for proto-repos.
  • Testing: Run tests monolithically initially (pytest .) but isolate per capability (pytest capabilities/capability_a).
  • CI/CD: Use GitHub Actions or GitLab CI with matrix jobs to build/test each capability separately.
  • Documentation: Use Sphinx for unified docs, with sub-projects included via intersphinx.
  • Migration Path: For existing projects, start by moving one module at a time into a new capability subdir, updating imports incrementally.

Assessment of Strengths and Problems

Strengths

  • Flexibility in Development: Enables rapid prototyping in a single repo, reducing context-switching. In Python, tools like Poetry make it easy to treat subdirs as editable packages, speeding up iteration.
  • Reusability Promotion: Encourages identifying and isolating capabilities early, leading to a library ecosystem. For example, a utility born in one project can be extracted and reused across others without duplication.
  • Maintainability: Consistent structure simplifies onboarding and refactoring. Dependency checks prevent spaghetti code, aligning with Python's "explicit is better than implicit" zen.
  • Scalability: Starts small but supports growth; mature Python projects (e.g., like Django's apps) often follow similar patterns, easing transition to micro-libraries.
  • Cost-Effective: Avoids premature repo proliferation, saving on CI minutes and management overhead.

Problems to Watch Out For

  • Dependency Creep: Despite guidelines, parent dependencies might sneak in during quick hacks. Mitigation: Enforce checks in pre-commit hooks and review PRs for import patterns.
  • Performance Overhead: In large repos, building/testing all capabilities together can slow down (e.g., poetry install across subdirs). Mitigation: Use selective commands (e.g., poetry run pytest capabilities/*) or tools like tox for parallel environments.
  • History Fragmentation: Git splits preserve history, but merging changes post-extraction requires care. Mitigation: Use git subtree for bidirectional sync if needed temporarily.
  • Tooling Limitations: Not all Python tools handle nested projects seamlessly (e.g., older setuptools might ignore sub-pyproject.toml). Mitigation: Stick to modern tools like Hatch or Poetry 1.2+ with workspaces.
  • Over-Modularization: Risk of creating too many tiny capabilities, leading to dependency hell. Mitigation: Set criteria for extraction (e.g., used in 3+ places, >500 LOC) and merge if underused.
  • Team Adoption: In collaborative settings, inconsistent adherence could cause issues. Mitigation: Document in CONTRIBUTING.md and use linters to automate enforcement.