6.0 KiB
Internal Extension Authoring
Purpose
This guide describes how to add a new internal Markitect extension without turning central modules into the main integration surface.
Use this for internal query engines, processors, backend/index stores, reference providers, validators, template/generation adapters, CLI command groups, render/export adapters, and future document functions.
Recommended Shape
Each extension should have:
- implementation module
- descriptor or descriptor factory
- focused tests
- characterization coverage if it changes existing behavior
- documentation or example link
- diagnostic namespace
- provenance operation prefix
- optional dependency declaration
- capability and safety declarations
Prefer this shape:
src/markitect_tool/<area>/<feature>.py
tests/test_<area>_<feature>.py
docs/<feature>.md
If the extension is cross-cutting, register it from
markitect_tool.extension.builtins or a future internal discovery module rather
than importing it from many central files.
Descriptor Template
from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability
def my_extension_descriptor() -> ExtensionDescriptor:
return ExtensionDescriptor(
id="query.example",
kind="query-engine",
summary="Example query engine.",
capabilities=[
ProcessingCapability(id="ast", kind="read"),
],
input_contract="Document + example expression",
output_contract="QueryMatch[]",
diagnostics_namespace="query.example",
provenance_prefix="query.example",
cli={"commands": ["mkt query --engine example"]},
docs=["docs/example-query.md"],
examples=["examples/query/example.md"],
)
Optional Dependencies
Declare optional dependencies in descriptors:
from markitect_tool.extension import OptionalDependency
OptionalDependency(
name="jsonpath_ng",
package="jsonpath-ng",
extra="query",
required=True,
purpose="Evaluate JSONPath expressions.",
)
If a dependency is missing, return a structured diagnostic. Do not fail with an unexplained import error.
Processing Envelopes
Use canonical processing envelopes where an extension needs a shared execution boundary:
ProcessingRequestProcessingContextProcessingResultProcessingCapabilityProcessingProvenanceProcessingTrace
Subsystem-specific dataclasses may remain richer. The canonical model is the bridge that lets callbacks, registries, diagnostics, provenance, and future policy checks interact consistently.
Minimal Runnable Extension
from markitect_tool.extension import (
ExtensionDescriptor,
ExtensionExecutor,
ExtensionRegistry,
ProcessingRequest,
ProcessingResult,
)
def run_example(request: ProcessingRequest) -> ProcessingResult:
name = request.input.get("name", "world")
return ProcessingResult(output=f"Hello, {name}")
descriptor = ExtensionDescriptor(
id="example.hello",
kind="example",
summary="Small example extension.",
factory=lambda: run_example,
)
registry = ExtensionRegistry([descriptor])
result = ExtensionExecutor(registry).execute(
"example.hello",
ProcessingRequest(operation="example.hello", input={"name": "Markitect"}),
)
Use this executor boundary when callbacks, dependency checks, trace events, or future policy checks matter. For tiny deterministic helpers, it is still fine to keep the existing direct function API and expose a descriptor alongside it.
Cache-Key Rules
ProcessingRequest.cache_key includes:
- operation
- input
- stable context material
- options
- scope
- declared capabilities
- request metadata
Stable context material includes source path, namespaces, variables, policy, and metadata. It does not include workspace root, caller, or live backend handles. This keeps cache keys portable while avoiding collisions for context-sensitive operations.
Diagnostics
Diagnostics should be:
- stable enough for tests and callers
- namespaced by subsystem or extension
- explicit about optional dependency failures
- tied to source locations where possible
- emitted as
DiagnosticorProcessingResult.from_error
Recommended code style:
<extension-kind>.<condition>
query.invalid_jsonpath
processor.unknown
extension.missing_dependency
backend.local_sqlite.invalid_fts_query
Provenance
Every extension that transforms, queries, reads, writes, generates, or indexes content should expose provenance. Use a stable operation prefix:
query.selector
query.jsonpath
processor.include
local_snapshot_store.put_file
Include source path, content hash, snapshot id, backend/provider id, and dependencies when known.
Safety And Policy
Descriptors should declare safety-relevant behavior:
- reads files
- writes local cache
- writes user output files
- accesses network
- invokes external process
- calls assisted-generation provider
- transmits content outside the local process
The initial framework records this metadata. Later policy layers can enforce it.
CLI Affordances
If an extension exposes CLI behavior, declare it in descriptor.cli:
cli={"commands": ["mkt cache index", "mkt search"]}
markitect_tool.cli.extensions.collect_cli_command_specs() can inspect these
affordances without importing Click command implementations.
Testing Checklist
Add tests for:
- descriptor serialization
- registry lookup and duplicate handling
- missing optional dependency diagnostics
- canonical result validity
- provenance shape
- CLI output envelope if public commands are exposed
- compatibility shim if replacing an existing API
When refactoring an existing feature, add characterization tests first, then migrate implementation behind descriptors or registries.
Boundary With Workflows
Internal extensions describe what Markitect can do. Workflows describe how a user combines capabilities for a concrete document pipeline.
An extension may expose a workflow step later, but it should not depend on the workflow engine to be useful from the library or CLI.