# Internal Extension Authoring ## Purpose This guide describes how to add a new internal Markitect extension without turning central modules into the main integration surface. Use this for internal query engines, processors, backend/index stores, reference providers, validators, template/generation adapters, CLI command groups, render/export adapters, and future document functions. ## Recommended Shape Each extension should have: - implementation module - descriptor or descriptor factory - focused tests - characterization coverage if it changes existing behavior - documentation or example link - diagnostic namespace - provenance operation prefix - optional dependency declaration - capability and safety declarations Prefer this shape: ```text src/markitect_tool//.py tests/test__.py docs/.md ``` If the extension is cross-cutting, register it from `markitect_tool.extension.builtins` or a future internal discovery module rather than importing it from many central files. ## Descriptor Template ```python from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability def my_extension_descriptor() -> ExtensionDescriptor: return ExtensionDescriptor( id="query.example", kind="query-engine", summary="Example query engine.", capabilities=[ ProcessingCapability(id="ast", kind="read"), ], input_contract="Document + example expression", output_contract="QueryMatch[]", diagnostics_namespace="query.example", provenance_prefix="query.example", cli={"commands": ["mkt query --engine example"]}, docs=["docs/example-query.md"], examples=["examples/query/example.md"], ) ``` ## Optional Dependencies Declare optional dependencies in descriptors: ```python from markitect_tool.extension import OptionalDependency OptionalDependency( name="jsonpath_ng", package="jsonpath-ng", extra="query", required=True, purpose="Evaluate JSONPath expressions.", ) ``` If a dependency is missing, return a structured diagnostic. Do not fail with an unexplained import error. ## Processing Envelopes Use canonical processing envelopes where an extension needs a shared execution boundary: - `ProcessingRequest` - `ProcessingContext` - `ProcessingResult` - `ProcessingCapability` - `ProcessingProvenance` - `ProcessingTrace` Subsystem-specific dataclasses may remain richer. The canonical model is the bridge that lets callbacks, registries, diagnostics, provenance, and future policy checks interact consistently. ### Minimal Runnable Extension ```python from markitect_tool.extension import ( ExtensionDescriptor, ExtensionExecutor, ExtensionRegistry, ProcessingRequest, ProcessingResult, ) def run_example(request: ProcessingRequest) -> ProcessingResult: name = request.input.get("name", "world") return ProcessingResult(output=f"Hello, {name}") descriptor = ExtensionDescriptor( id="example.hello", kind="example", summary="Small example extension.", factory=lambda: run_example, ) registry = ExtensionRegistry([descriptor]) result = ExtensionExecutor(registry).execute( "example.hello", ProcessingRequest(operation="example.hello", input={"name": "Markitect"}), ) ``` Use this executor boundary when callbacks, dependency checks, trace events, or future policy checks matter. For tiny deterministic helpers, it is still fine to keep the existing direct function API and expose a descriptor alongside it. ### Cache-Key Rules `ProcessingRequest.cache_key` includes: - operation - input - stable context material - options - scope - declared capabilities - request metadata Stable context material includes source path, namespaces, variables, policy, and metadata. It does not include workspace root, caller, or live backend handles. This keeps cache keys portable while avoiding collisions for context-sensitive operations. ## Diagnostics Diagnostics should be: - stable enough for tests and callers - namespaced by subsystem or extension - explicit about optional dependency failures - tied to source locations where possible - emitted as `Diagnostic` or `ProcessingResult.from_error` Recommended code style: ```text . query.invalid_jsonpath processor.unknown extension.missing_dependency backend.local_sqlite.invalid_fts_query ``` ## Provenance Every extension that transforms, queries, reads, writes, generates, or indexes content should expose provenance. Use a stable operation prefix: ```text query.selector query.jsonpath processor.include local_snapshot_store.put_file ``` Include source path, content hash, snapshot id, backend/provider id, and dependencies when known. ## Safety And Policy Descriptors should declare safety-relevant behavior: - reads files - writes local cache - writes user output files - accesses network - invokes external process - calls assisted-generation provider - transmits content outside the local process The initial framework records this metadata. Later policy layers can enforce it. ## CLI Affordances If an extension exposes CLI behavior, declare it in `descriptor.cli`: ```python cli={"commands": ["mkt cache index", "mkt search"]} ``` `markitect_tool.cli.extensions.collect_cli_command_specs()` can inspect these affordances without importing Click command implementations. ## Testing Checklist Add tests for: - descriptor serialization - registry lookup and duplicate handling - missing optional dependency diagnostics - canonical result validity - provenance shape - CLI output envelope if public commands are exposed - compatibility shim if replacing an existing API When refactoring an existing feature, add characterization tests first, then migrate implementation behind descriptors or registries. ## Boundary With Workflows Internal extensions describe what Markitect can do. Workflows describe how a user combines capabilities for a concrete document pipeline. An extension may expose a workflow step later, but it should not depend on the workflow engine to be useful from the library or CLI.