docs(roadmap): add workplan for extracting llm module as shared library

3-stage plan: decouple (RunConfig/LLMResponse move + app name parameterization) → extract to standalone package → adopt in first consumer. Registered as workstream in Custodian State Hub. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:51:54 +01:00
parent e9dc9a8517
commit eaf4a955af
1 changed files with 214 additions and 0 deletions
--- a/roadmap/llm-shared-library/PLAN.md
+++ b/roadmap/llm-shared-library/PLAN.md
@@ -0,0 +1,214 @@
+# LLM Adapter Layer — Extract as Shared Library
+
+## Vision
+
+The `markitect.llm` module is a clean, stdlib-only adapter layer for calling
+LLMs via OpenRouter, Gemini, OpenAI, and the Claude Code CLI. It implements a
+uniform interface, a 7-layer TOML config chain, embedding support with caching,
+and typed exceptions. It should be usable by all projects in the Bernd Worsch
+ecosystem without pulling in all of markitect.
+
+This roadmap tracks extracting it into a standalone installable library.
+
+---
+
+## Current State
+
+The module lives at `markitect/llm/` (~16 files, ~1500 LOC, stdlib-only) and
+provides:
+- **4 text adapters**: OpenRouter, Gemini, OpenAI, Claude Code CLI
+- **2 embedding adapters**: OpenAI-compatible (OpenAI + OpenRouter)
+- **Embedding cache**: JSON-backed, content-digest validated
+- **Similarity utilities**: pure-Python cosine similarity, matrix, pair-finding
+- **7-layer TOML config chain**: CLI > env > user/dir preference/default > hardcoded
+- **Typed exceptions**: LLMError hierarchy
+- **HTTP wrapper**: urllib-only, typed exception translation
+
+### Two Coupling Issues Blocking Clean Extraction
+
+| Issue | Location | Severity |
+|-------|----------|----------|
+| `RunConfig` and `LLMResponse` are defined in `markitect.prompts.execution.models`, not in `markitect.llm` | `markitect/prompts/execution/models.py` | High — creates cross-module import for all consumers |
+| TOML config chain hardcodes `"markitect"` as app name (paths: `~/.config/markitect/`, env prefix `MARKITECT_`, files: `.markitect.toml`) | `markitect/llm/toml_config.py` | Medium — consumers either accept markitect config or can't use the chain |
+
+---
+
+## Terminology
+
+- **adapter**: concrete implementation of `LLMAdapter` for a single provider
+- **factory**: `create_adapter()` / `create_embedding_adapter()` — provider-agnostic entry points
+- **config chain**: 7-layer resolution of provider + model (CLI → env → TOML → hardcoded)
+- **standalone library**: a Python package installable with `pip install` from a git URL or local path, without PyPI
+- **consumer**: any project that imports and uses the library (markitect itself, custodian, railiance, etc.)
+
+---
+
+## Packaging Decision (Pending)
+
+Before Phase 2 starts, one architectural decision must be resolved:
+
+> **D1: Where does the extracted library live?**
+>
+> **Option A — Standalone repo** (`~/bw-llm` or similar):
+> - Clean separation, versioned independently, installable via `pip install git+file:///...` or git URL
+> - Adds a repo to maintain; changes require bumping version in dependents
+>
+> **Option B — Subfolder of markitect with own `pyproject.toml`** (monorepo-lite):
+> - Stays co-located with the main codebase that will use it most
+> - Less friction for iteration; single git history
+> - Slightly unorthodox but valid for personal infrastructure
+>
+> **Option C — Just `pip install markitect` in other projects**:
+> - Zero extraction work; reuse today
+> - Pulls all of markitect (prompts, infospace, CLI, etc.) as transitive deps
+> - Acceptable short-term if other projects are small
+
+---
+
+## Stages
+
+### Stage 1 — Decouple (within markitect)
+
+Prepare the module for extraction without changing its public API.
+
+#### S1.1 — Move RunConfig + LLMResponse into markitect.llm
+
+`RunConfig` and `LLMResponse` are currently in `markitect.prompts.execution.models`.
+The LLM adapters import from there, creating a hard dependency on the prompt system.
+
+**Work:**
+- Move both dataclasses to `markitect/llm/models.py`
+- Update all imports in `markitect.llm` and `markitect.prompts`
+- Keep a re-export shim in `markitect.prompts.execution.models` for backwards compat
+
+**Acceptance:** `markitect/llm/` has zero imports from `markitect.prompts.*`
+
+#### S1.2 — Parameterize the TOML config chain
+
+Replace the hardcoded `"markitect"` app name with a configurable `app_name` parameter.
+
+**Work:**
+- Add `app_name: str = "markitect"` parameter to `resolve_llm()` and the config
+  path helpers in `toml_config.py`
+- Derive config file path (`~/.config/{app_name}/config.toml`), env prefix
+  (`{APP_NAME}_HELPER_MODEL`), and local config file (`.{app_name}.toml`) from it
+- All existing behaviour is preserved when `app_name="markitect"` (default)
+
+**Acceptance:** A consumer can call `resolve_llm(app_name="railiance")` and get
+config from `~/.config/railiance/config.toml` and `RAILIANCE_HELPER_MODEL`.
+
+#### S1.3 — Isolation tests
+
+Write a test file that imports only from `markitect.llm.*` and verifies no
+accidental coupling remains.
+
+**Acceptance:** `pytest tests/test_llm_isolation.py` passes; no import of
+`markitect.prompts` or `markitect.infospace` in the LLM module tree.
+
+---
+
+### Stage 2 — Extract
+
+#### S2.1 — Resolve D1: packaging location
+
+Record the decision and create the package scaffold.
+
+**Acceptance:** D1 resolved, `pyproject.toml` for the library exists at the
+chosen location with name, version `0.1.0`, and declared dependencies.
+
+#### S2.2 — Create standalone package
+
+Move (or symlink) the llm module into the new package structure. Wire up
+the `pyproject.toml` entry points. Verify `pip install -e <path>` works.
+
+**Files to carry over:**
+```
+llm/
+  __init__.py          # re-exports: create_adapter, create_embedding_adapter,
+                       #   LLMAdapter, EmbeddingAdapter, LLMConfig, exceptions
+  models.py            # RunConfig, LLMResponse (moved from S1.1)
+  config.py            # load_config, resolve_api_key
+  toml_config.py       # resolve_llm (parameterized from S1.2)
+  factory.py           # create_adapter
+  exceptions.py        # LLM exception hierarchy
+  openrouter.py
+  claude_code.py
+  gemini.py
+  openai.py
+  embedding_adapter.py
+  embedding_openai.py
+  embedding_factory.py # create_embedding_adapter
+  embedding_cache.py
+  similarity.py
+  _http.py
+  _token_estimator.py
+```
+
+**Acceptance:** `python -c "from bw_llm import create_adapter; print('ok')"` works
+in a fresh venv with only the new package installed.
+
+#### S2.3 — Update markitect to depend on extracted package
+
+Replace `markitect/llm/` with an import alias pointing to the new package, or
+add the package as a path dependency in markitect's `pyproject.toml`.
+
+**Acceptance:** All markitect tests pass; `markitect/llm/__init__.py` is either
+removed or becomes a thin re-export of `bw_llm`.
+
+#### S2.4 — Integration smoke test
+
+Run the full markitect infospace pipeline (entity extraction + evaluation) end-to-end
+against a small fixture to confirm nothing broke.
+
+**Acceptance:** `markitect infospace evaluate --dry-run` succeeds on a 3-entity fixture.
+
+---
+
+### Stage 3 — Adopt in First Consumer
+
+#### S3.1 — Integrate in one other project
+
+Pick the first real consumer (likely the custodian state-hub, for LLM-assisted
+state summaries or decision rationale generation) and wire up the library.
+
+**Work:**
+- Add `bw-llm` (or equivalent) as a dependency
+- Write a small usage example (e.g., `llm_helper.py`)
+- Confirm config chain works with the consumer's own app name
+
+#### S3.2 — Usage guide
+
+Write `README.md` for the library covering:
+- Installation (local path / git URL)
+- Supported providers and env vars
+- TOML config file locations and format
+- `create_adapter()` / `create_embedding_adapter()` quick-start
+- Error handling
+
+**Acceptance:** Another developer (or agent) can follow the README to use the library
+in a new project without reading source code.
+
+---
+
+## Stage Summary
+
+| Stage | Description | Key Deliverable | Blocks |
+|-------|-------------|-----------------|--------|
+| S1.1 | Move RunConfig/LLMResponse to llm | Zero cross-module deps | S2.2 |
+| S1.2 | Parameterize app name | Configurable config chain | S2.2 |
+| S1.3 | Isolation tests | Green test suite | S2.1 |
+| S2.1 | Resolve packaging decision (D1) | pyproject.toml scaffold | S2.2 |
+| S2.2 | Create standalone package | `pip install` works | S2.3 |
+| S2.3 | Update markitect | markitect uses extracted lib | S2.4 |
+| S2.4 | Integration smoke test | Full pipeline passes | S3.1 |
+| S3.1 | First consumer integration | Library used in real project | S3.2 |
+| S3.2 | Usage guide | README published | — |
+
+---
+
+## Out of Scope
+
+- Publishing to PyPI (unnecessary for personal infrastructure; git/local installs suffice)
+- Adding new LLM providers (separate concern)
+- Porting the helper CLI to the library (the CLI is markitect-specific)
+- Async adapters (current sync interface is sufficient; can be added later)