Files
tegwick eaf4a955af
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
docs(roadmap): add workplan for extracting llm module as shared library
3-stage plan: decouple (RunConfig/LLMResponse move + app name
parameterization) → extract to standalone package → adopt in first
consumer. Registered as workstream in Custodian State Hub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:51:54 +01:00

8.4 KiB

LLM Adapter Layer — Extract as Shared Library

Vision

The markitect.llm module is a clean, stdlib-only adapter layer for calling LLMs via OpenRouter, Gemini, OpenAI, and the Claude Code CLI. It implements a uniform interface, a 7-layer TOML config chain, embedding support with caching, and typed exceptions. It should be usable by all projects in the Bernd Worsch ecosystem without pulling in all of markitect.

This roadmap tracks extracting it into a standalone installable library.


Current State

The module lives at markitect/llm/ (~16 files, ~1500 LOC, stdlib-only) and provides:

  • 4 text adapters: OpenRouter, Gemini, OpenAI, Claude Code CLI
  • 2 embedding adapters: OpenAI-compatible (OpenAI + OpenRouter)
  • Embedding cache: JSON-backed, content-digest validated
  • Similarity utilities: pure-Python cosine similarity, matrix, pair-finding
  • 7-layer TOML config chain: CLI > env > user/dir preference/default > hardcoded
  • Typed exceptions: LLMError hierarchy
  • HTTP wrapper: urllib-only, typed exception translation

Two Coupling Issues Blocking Clean Extraction

Issue Location Severity
RunConfig and LLMResponse are defined in markitect.prompts.execution.models, not in markitect.llm markitect/prompts/execution/models.py High — creates cross-module import for all consumers
TOML config chain hardcodes "markitect" as app name (paths: ~/.config/markitect/, env prefix MARKITECT_, files: .markitect.toml) markitect/llm/toml_config.py Medium — consumers either accept markitect config or can't use the chain

Terminology

  • adapter: concrete implementation of LLMAdapter for a single provider
  • factory: create_adapter() / create_embedding_adapter() — provider-agnostic entry points
  • config chain: 7-layer resolution of provider + model (CLI → env → TOML → hardcoded)
  • standalone library: a Python package installable with pip install from a git URL or local path, without PyPI
  • consumer: any project that imports and uses the library (markitect itself, custodian, railiance, etc.)

Packaging Decision (Pending)

Before Phase 2 starts, one architectural decision must be resolved:

D1: Where does the extracted library live?

Option A — Standalone repo (~/bw-llm or similar):

  • Clean separation, versioned independently, installable via pip install git+file:///... or git URL
  • Adds a repo to maintain; changes require bumping version in dependents

Option B — Subfolder of markitect with own pyproject.toml (monorepo-lite):

  • Stays co-located with the main codebase that will use it most
  • Less friction for iteration; single git history
  • Slightly unorthodox but valid for personal infrastructure

Option C — Just pip install markitect in other projects:

  • Zero extraction work; reuse today
  • Pulls all of markitect (prompts, infospace, CLI, etc.) as transitive deps
  • Acceptable short-term if other projects are small

Stages

Stage 1 — Decouple (within markitect)

Prepare the module for extraction without changing its public API.

S1.1 — Move RunConfig + LLMResponse into markitect.llm

RunConfig and LLMResponse are currently in markitect.prompts.execution.models. The LLM adapters import from there, creating a hard dependency on the prompt system.

Work:

  • Move both dataclasses to markitect/llm/models.py
  • Update all imports in markitect.llm and markitect.prompts
  • Keep a re-export shim in markitect.prompts.execution.models for backwards compat

Acceptance: markitect/llm/ has zero imports from markitect.prompts.*

S1.2 — Parameterize the TOML config chain

Replace the hardcoded "markitect" app name with a configurable app_name parameter.

Work:

  • Add app_name: str = "markitect" parameter to resolve_llm() and the config path helpers in toml_config.py
  • Derive config file path (~/.config/{app_name}/config.toml), env prefix ({APP_NAME}_HELPER_MODEL), and local config file (.{app_name}.toml) from it
  • All existing behaviour is preserved when app_name="markitect" (default)

Acceptance: A consumer can call resolve_llm(app_name="railiance") and get config from ~/.config/railiance/config.toml and RAILIANCE_HELPER_MODEL.

S1.3 — Isolation tests

Write a test file that imports only from markitect.llm.* and verifies no accidental coupling remains.

Acceptance: pytest tests/test_llm_isolation.py passes; no import of markitect.prompts or markitect.infospace in the LLM module tree.


Stage 2 — Extract

S2.1 — Resolve D1: packaging location

Record the decision and create the package scaffold.

Acceptance: D1 resolved, pyproject.toml for the library exists at the chosen location with name, version 0.1.0, and declared dependencies.

S2.2 — Create standalone package

Move (or symlink) the llm module into the new package structure. Wire up the pyproject.toml entry points. Verify pip install -e <path> works.

Files to carry over:

llm/
  __init__.py          # re-exports: create_adapter, create_embedding_adapter,
                       #   LLMAdapter, EmbeddingAdapter, LLMConfig, exceptions
  models.py            # RunConfig, LLMResponse (moved from S1.1)
  config.py            # load_config, resolve_api_key
  toml_config.py       # resolve_llm (parameterized from S1.2)
  factory.py           # create_adapter
  exceptions.py        # LLM exception hierarchy
  openrouter.py
  claude_code.py
  gemini.py
  openai.py
  embedding_adapter.py
  embedding_openai.py
  embedding_factory.py # create_embedding_adapter
  embedding_cache.py
  similarity.py
  _http.py
  _token_estimator.py

Acceptance: python -c "from bw_llm import create_adapter; print('ok')" works in a fresh venv with only the new package installed.

S2.3 — Update markitect to depend on extracted package

Replace markitect/llm/ with an import alias pointing to the new package, or add the package as a path dependency in markitect's pyproject.toml.

Acceptance: All markitect tests pass; markitect/llm/__init__.py is either removed or becomes a thin re-export of bw_llm.

S2.4 — Integration smoke test

Run the full markitect infospace pipeline (entity extraction + evaluation) end-to-end against a small fixture to confirm nothing broke.

Acceptance: markitect infospace evaluate --dry-run succeeds on a 3-entity fixture.


Stage 3 — Adopt in First Consumer

S3.1 — Integrate in one other project

Pick the first real consumer (likely the custodian state-hub, for LLM-assisted state summaries or decision rationale generation) and wire up the library.

Work:

  • Add bw-llm (or equivalent) as a dependency
  • Write a small usage example (e.g., llm_helper.py)
  • Confirm config chain works with the consumer's own app name

S3.2 — Usage guide

Write README.md for the library covering:

  • Installation (local path / git URL)
  • Supported providers and env vars
  • TOML config file locations and format
  • create_adapter() / create_embedding_adapter() quick-start
  • Error handling

Acceptance: Another developer (or agent) can follow the README to use the library in a new project without reading source code.


Stage Summary

Stage Description Key Deliverable Blocks
S1.1 Move RunConfig/LLMResponse to llm Zero cross-module deps S2.2
S1.2 Parameterize app name Configurable config chain S2.2
S1.3 Isolation tests Green test suite S2.1
S2.1 Resolve packaging decision (D1) pyproject.toml scaffold S2.2
S2.2 Create standalone package pip install works S2.3
S2.3 Update markitect markitect uses extracted lib S2.4
S2.4 Integration smoke test Full pipeline passes S3.1
S3.1 First consumer integration Library used in real project S3.2
S3.2 Usage guide README published

Out of Scope

  • Publishing to PyPI (unnecessary for personal infrastructure; git/local installs suffice)
  • Adding new LLM providers (separate concern)
  • Porting the helper CLI to the library (the CLI is markitect-specific)
  • Async adapters (current sync interface is sufficient; can be added later)