# Internal Extension Framework ## Purpose Markitect has reached the point where optional features are useful but are starting to concentrate wiring in central modules. Query engines, processors, backend stores, references, contract checks, templates, generation adapters, and CLI commands all need some combination of registration, capability metadata, diagnostics, provenance, and optional dependency handling. The internal extension framework should make those seams explicit without turning the project into a heavy external plugin platform. ## Boundary This framework is about internal extensibility: ```text feature descriptor -> registry -> processing request/context/result -> diagnostics/provenance/capabilities -> CLI/API/backend integration ``` It is not the same as `MKTT-WP-0011` dataflow workflows. Workflows organize business-facing processing steps for a document pipeline. The extension framework organizes how Markitect itself exposes and composes capabilities. ## Extension Taxonomy | Kind | Examples | Primary Contract | | --- | --- | --- | | `query-engine` | selector, JSONPath | document/data in, matches out | | `processor` | identity, uppercase, include | fenced block in, processed result out | | `backend` | local SQLite index | snapshots/index/search storage | | `reference-provider` | section, region, fence, line | address in, content units out | | `validator` | schema, contract, section assertion | document/context in, diagnostics out | | `runtime` | context loader, form state, dynamic rules | document/contract/context in, diagnostics and state out | | `assessment-runner` | provider-neutral rubric execution | assessment request in, normalized result out | | `policy-gateway` | local label gateway, future external auth adapters | subject/action/object in, decision or filtered results out | | `template-engine` | deterministic templates | template/data in, Markdown out | | `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out | | `cli-group` | cache, backend, ref, class | command descriptors or registration hook | | `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out | | `document-function` | future function layer | function call in, typed document value out | ## Canonical Lifecycle An extension should be describable before it runs: 1. Register descriptor. 2. Check optional dependencies. 3. Check capabilities and policy labels. 4. Build processing context. 5. Execute operation. 6. Normalize result, diagnostics, provenance, and trace data. 7. Expose output through library API, CLI, backend, or workflow layer. The framework should allow deterministic extensions to stay simple. Assisted, external, networked, or filesystem-mutating extensions should declare that explicitly before execution. ## Descriptor Shape The first descriptor model should cover: - stable id - kind - version - summary - implementation reference - capability declarations - optional dependency declarations - safety flags - input and output contract names - diagnostics namespace - provenance operation prefix - documentation and example links - CLI affordances where applicable Descriptors are not meant to replace implementation modules. They are the small declarative surface that lets Markitect inspect, list, validate, and compose capabilities consistently. ## Processing Model The canonical processing model should define a small set of shared envelopes: - `ProcessingRequest`: operation id, input payload, options, scope - `ProcessingContext`: root, source path, namespaces, variables, policy, backend handles, and caller metadata - `ProcessingResult`: output payload, diagnostics, provenance, dependencies, trace events, and validity - `ProcessingDiagnostic`: severity, code, message, source, help - `ProcessingCapability`: declared feature or permission requirement - `ProcessingProvenance`: operation, source identity, snapshot/content hashes, dependencies, backend/provider metadata Subsystem-specific types may remain richer. The canonical model is the bridge, not a forced replacement for every local dataclass. ### Processing Request Example ```python from pathlib import Path from markitect_tool.extension import ProcessingContext, ProcessingRequest request = ProcessingRequest( operation="query.selector", input={"selector": "sections[heading=Decision]"}, context=ProcessingContext( source_path=Path("docs/adr.md"), namespaces={"std": "standards"}, variables={"audience": "internal"}, ), options={"format": "json"}, scope="document", ) ``` The request cache key includes operation, input, options, scope, declared capabilities, metadata, and stable context semantics: - source path - namespaces - variables - policy - metadata It intentionally excludes root path, caller name, and live backend handles. Those are execution-environment details. If they matter semantically for an extension, put explicit values in request options or metadata. ### Processing Result Example ```python from markitect_tool.extension import ( ProcessingProvenance, ProcessingResult, ProcessingTrace, ) result = ProcessingResult( output={"count": 1}, provenance=[ ProcessingProvenance( operation="query.selector", source_path="docs/adr.md", content_hash="sha256:...", ) ], ).with_trace(ProcessingTrace(event="query.done")) ``` `ProcessingResult.valid` is derived from diagnostics. Any diagnostic with severity `error` makes the result invalid. ## Registration Strategy Start with in-package registration: ```text markitect_tool/extensions/ query_selector.py query_jsonpath.py backend_local_sqlite.py processors_builtin.py ``` Each module exposes one or more descriptors plus a registration function. The root registry can be assembled explicitly at import time or by a small internal discovery list. Package entry points can be added later if external extension packages become a real requirement. See `docs/extension-authoring.md` for the extension authoring checklist and descriptor template. ### Registry Use Extension registries are optimized for common lookup patterns: - `registry.get("backend.local-sqlite")` - `registry.list(kind="query-engine")` - `registry.require_capability("fts")` - `registry.check_dependencies("jsonpath")` Kinds and capabilities are indexed at registration time, so large registries can avoid repeated full scans for basic discovery. The same registry is exposed on the CLI for practical discovery: ```bash mkt extension list mkt extension inspect memory.context-package mkt extension commands ``` This makes the extension catalog part of the user-visible contract. When a capability gains a CLI command, the descriptor should declare that command so generated docs and audits can compare intent with the live surface. ### Execution Lifecycle `ExtensionExecutor` wraps a descriptor factory with deterministic lifecycle hooks: 1. Fetch descriptor. 2. Check required optional dependencies. 3. Instantiate callable implementation. 4. Run `before` callbacks. 5. Execute implementation. 6. Normalize result type. 7. Append `extension.executed` trace. 8. Run success or failure callbacks. 9. Run final `after` callbacks. Callbacks are explicit. The framework does not introduce hidden global behavior. ## Compatibility Rules The refactor must preserve: - current library APIs such as `query_document` - current CLI commands and output envelopes - current diagnostic codes where users may rely on them - current provenance operation strings unless intentionally deprecated - optional dependency behavior for JSONPath and future adapters - cache/index file compatibility unless a migration is documented The first implementation adds canonical processing envelopes, extension descriptors, registries, lifecycle callbacks, query-engine registry shims, built-in extension descriptors, and CLI command specs while preserving existing public commands. ## Characterization Coverage Before refactoring, lock down: - selector query and extraction - optional JSONPath diagnostics - processor registry behavior and provenance - backend manifest registry and capability checks - local SQLite snapshot/index/search behavior - content reference resolution - representative CLI command envelopes - provenance and diagnostic shapes These tests are deliberately a little redundant with unit tests. Their job is to protect the current public behavior while internals move behind extension descriptors and registries.