Composable-Repository Paradigm Specification
Overview
The composable-repository paradigm is a software development approach designed to facilitate incremental modularity in projects, particularly for Python applications. It addresses the common challenge where developers identify reusable "capabilities" (self-contained functionalities including code, tests, documentation, and configurations) during active development but face friction in extracting them into independent repositories. By organizing these capabilities as subdirectories within the main repository—mirroring the main repo's structure—this paradigm enables gradual separation without disrupting the core project. The goal is to maintain a compact, cohesive codebase while preparing for reusability, promoting principles like separation of concerns, DRY (Don't Repeat Yourself), and eventual library extraction.
This specification is tailored for Python projects, leveraging ecosystem tools like Poetry, setuptools, or Hatch for dependency management and packaging. It assumes a monorepo-like starting point where the main project evolves, and capabilities are nurtured until ready for independence.
Key Concepts
- Capability: A modular unit of functionality that could stand alone as a library or package. Examples include a custom logging handler, a data validation module, or a utility for API interactions. Each capability includes:
- Source code (e.g., Python modules/classes).
- Tests (e.g., using pytest).
- Documentation (e.g., README.md, docstrings).
- Configurations (e.g., pyproject.toml snippets, requirements files).
- Subdirectory as Proto-Repository: Each capability lives in a dedicated subdirectory (e.g.,
/capabilities/my_capability) that mimics the main repository's structure. This ensures consistency and eases eventual extraction. - Incremental Separation: Capabilities start intertwined with the core but are refactored over time to reduce dependencies, checked via tools, until they can be split into standalone repositories (e.g., via Git subtree split).
Repository Structure
The main repository should follow a standard Python project layout, with capabilities nested under a top-level directory like /capabilities. This keeps them visible but segregated.
Example structure for a main repo named my_project:
my_project/
├── pyproject.toml # Main project config (Poetry/Hatch/setuptools)
├── README.md # Overall docs
├── src/ # Main source code
│ └── my_project/
│ ├── __init__.py
│ └── core_module.py
├── tests/ # Main tests
│ └── test_core.py
├── capabilities/ # Container for proto-repos
│ ├── capability_a/ # Example capability (e.g., a reusable auth module)
│ │ ├── pyproject.toml # Capability-specific config (for editable installs)
│ │ ├── README.md # Docs for this capability
│ │ ├── src/
│ │ │ └── capability_a/
│ │ │ ├── __init__.py
│ │ │ └── auth.py
│ │ └── tests/
│ │ └── test_auth.py
│ └── capability_b/ # Another capability (e.g., data utils)
│ ├── pyproject.toml
│ ├── README.md
│ ├── src/
│ │ └── capability_b/
│ │ ├── __init__.py
│ │ └── utils.py
│ └── tests/
│ └── test_utils.py
└── setup.cfg # Optional, if using setuptools
- Consistency Rule: Each capability's subdirectory must replicate the main repo's conventions, e.g.:
- Use
src/layout for PEP 660 compliance. - Same testing framework (e.g., pytest with consistent fixtures).
- Uniform docstring styles (e.g., Google or NumPy format).
- Shared linting rules (e.g., via a root
.pre-commit-config.yaml).
- Use
- Shared Resources: Place common utilities (e.g., a base test class) in the main repo initially, but refactor them into capabilities if they prove reusable.
Dependency Guidelines
Dependencies are critical to ensure capabilities can be extracted without breakage. The core principle is unidirectional dependency flow: Capabilities may depend on each other or external libraries, but never on the parent (main) project. This prevents tight coupling and circular dependencies.
-
Allowed Dependencies:
- Capabilities can depend on external PyPI packages (e.g.,
requestsfor an API capability). - Inter-capability dependencies are permitted if acyclic (e.g.,
capability_adepends oncapability_b, but not vice versa). Use tools likedephell deps graphorpipdeptreeto visualize. - Use editable installs for local development: In the main
pyproject.toml, add capabilities as dev dependencies likecapability_a = { path = "./capabilities/capability_a", develop = true }(Poetry syntax).
- Capabilities can depend on external PyPI packages (e.g.,
-
Prohibited Dependencies:
- No imports from the main project's modules into a capability (e.g., avoid
from my_project.core_module import somethingincapability_a). - Avoid implicit dependencies like shared global configs; duplicate if needed, then refactor.
- No imports from the main project's modules into a capability (e.g., avoid
-
Incremental Dependency Checks:
- Static Analysis: Use
pylintorflake8with plugins likeflake8-import-orderto enforce import rules. Configure a custom rule to flag imports from..my_project. - Runtime Checks: Write a script (e.g., in
/scripts/check_deps.py) usingimportlibto scan modules and assert no parent imports:Run this via pre-commit hooks or CI.import importlib import os from pathlib import Path def check_no_parent_imports(capability_dir: Path): parent_module = 'my_project' # Adjust to your main module for root, _, files in os.walk(capability_dir / 'src'): for file in files: if file.endswith('.py'): module_path = (Path(root) / file).relative_to(capability_dir) module_name = str(module_path).replace('/', '.').rstrip('.py') mod = importlib.import_module(module_name) for name in dir(mod): attr = getattr(mod, name) if hasattr(attr, '__module__') and parent_module in attr.__module__: raise ValueError(f"Invalid import from parent in {module_name}") - Refactoring Process: Start with loose coupling—use interfaces/abstract base classes (from
abc) for interactions. Gradually replace parent dependencies with capability-internal implementations or new sub-capabilities. - Tooling Integration: Use Poetry's workspace feature (via
poetry workspace) or Hatch's environments to manage multi-package builds. For checks, integratesafetyfor vulnerability scans andbanditfor security issues across all subdirs.
- Static Analysis: Use
Extraction Process
To promote a capability to its own repository:
- Maturity Check: Ensure 80%+ test coverage (via
pytest --cov), full docs, and no parent dependencies (via checks above). - Git Split: Use
git subtree split -P capabilities/capability_a -b capability_a_branchto create a branch with only that subdir's history. - New Repo Setup: Push the branch to a new Git repo, add a top-level
pyproject.toml, and publish to PyPI if reusable. - Update Main Repo: Replace the subdir with a dependency (e.g.,
pip install git+https://github.com/user/capability_a.gitor from PyPI). Remove the subdir and commit. - Automation: Script the process with a tool like
cookiecutterfor templating new capabilities.
Best Practices for Python Implementation
- Versioning: Use semantic versioning in each capability's
pyproject.toml. Start with0.1.0-devfor proto-repos. - Testing: Run tests monolithically initially (
pytest .) but isolate per capability (pytest capabilities/capability_a). - CI/CD: Use GitHub Actions or GitLab CI with matrix jobs to build/test each capability separately.
- Documentation: Use Sphinx for unified docs, with sub-projects included via
intersphinx. - Migration Path: For existing projects, start by moving one module at a time into a new capability subdir, updating imports incrementally.
Assessment of Strengths and Problems
Strengths
- Flexibility in Development: Enables rapid prototyping in a single repo, reducing context-switching. In Python, tools like Poetry make it easy to treat subdirs as editable packages, speeding up iteration.
- Reusability Promotion: Encourages identifying and isolating capabilities early, leading to a library ecosystem. For example, a utility born in one project can be extracted and reused across others without duplication.
- Maintainability: Consistent structure simplifies onboarding and refactoring. Dependency checks prevent spaghetti code, aligning with Python's "explicit is better than implicit" zen.
- Scalability: Starts small but supports growth; mature Python projects (e.g., like Django's apps) often follow similar patterns, easing transition to micro-libraries.
- Cost-Effective: Avoids premature repo proliferation, saving on CI minutes and management overhead.
Problems to Watch Out For
- Dependency Creep: Despite guidelines, parent dependencies might sneak in during quick hacks. Mitigation: Enforce checks in pre-commit hooks and review PRs for import patterns.
- Performance Overhead: In large repos, building/testing all capabilities together can slow down (e.g.,
poetry installacross subdirs). Mitigation: Use selective commands (e.g.,poetry run pytest capabilities/*) or tools liketoxfor parallel environments. - History Fragmentation: Git splits preserve history, but merging changes post-extraction requires care. Mitigation: Use
git subtreefor bidirectional sync if needed temporarily. - Tooling Limitations: Not all Python tools handle nested projects seamlessly (e.g., older setuptools might ignore sub-pyproject.toml). Mitigation: Stick to modern tools like Hatch or Poetry 1.2+ with workspaces.
- Over-Modularization: Risk of creating too many tiny capabilities, leading to dependency hell. Mitigation: Set criteria for extraction (e.g., used in 3+ places, >500 LOC) and merge if underused.
- Team Adoption: In collaborative settings, inconsistent adherence could cause issues. Mitigation: Document in CONTRIBUTING.md and use linters to automate enforcement.