7.7 KiB
Internal Extension Framework
Purpose
Markitect has reached the point where optional features are useful but are starting to concentrate wiring in central modules. Query engines, processors, backend stores, references, contract checks, templates, generation adapters, and CLI commands all need some combination of registration, capability metadata, diagnostics, provenance, and optional dependency handling.
The internal extension framework should make those seams explicit without turning the project into a heavy external plugin platform.
Boundary
This framework is about internal extensibility:
feature descriptor -> registry -> processing request/context/result
-> diagnostics/provenance/capabilities
-> CLI/API/backend integration
It is not the same as MKTT-WP-0011 dataflow workflows. Workflows organize
business-facing processing steps for a document pipeline. The extension
framework organizes how Markitect itself exposes and composes capabilities.
Extension Taxonomy
| Kind | Examples | Primary Contract |
|---|---|---|
query-engine |
selector, JSONPath | document/data in, matches out |
processor |
identity, uppercase, include | fenced block in, processed result out |
backend |
local SQLite index | snapshots/index/search storage |
reference-provider |
section, region, fence, line | address in, content units out |
validator |
schema, contract, section assertion | document/context in, diagnostics out |
template-engine |
deterministic templates | template/data in, Markdown out |
generation-adapter |
provider-neutral assisted generation | request in, generated candidate out |
cli-group |
cache, backend, ref, class | command descriptors or registration hook |
render-export |
future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
document-function |
future function layer | function call in, typed document value out |
Canonical Lifecycle
An extension should be describable before it runs:
- Register descriptor.
- Check optional dependencies.
- Check capabilities and policy labels.
- Build processing context.
- Execute operation.
- Normalize result, diagnostics, provenance, and trace data.
- Expose output through library API, CLI, backend, or workflow layer.
The framework should allow deterministic extensions to stay simple. Assisted, external, networked, or filesystem-mutating extensions should declare that explicitly before execution.
Descriptor Shape
The first descriptor model should cover:
- stable id
- kind
- version
- summary
- implementation reference
- capability declarations
- optional dependency declarations
- safety flags
- input and output contract names
- diagnostics namespace
- provenance operation prefix
- documentation and example links
- CLI affordances where applicable
Descriptors are not meant to replace implementation modules. They are the small declarative surface that lets Markitect inspect, list, validate, and compose capabilities consistently.
Processing Model
The canonical processing model should define a small set of shared envelopes:
ProcessingRequest: operation id, input payload, options, scopeProcessingContext: root, source path, namespaces, variables, policy, backend handles, and caller metadataProcessingResult: output payload, diagnostics, provenance, dependencies, trace events, and validityProcessingDiagnostic: severity, code, message, source, helpProcessingCapability: declared feature or permission requirementProcessingProvenance: operation, source identity, snapshot/content hashes, dependencies, backend/provider metadata
Subsystem-specific types may remain richer. The canonical model is the bridge, not a forced replacement for every local dataclass.
Processing Request Example
from pathlib import Path
from markitect_tool.extension import ProcessingContext, ProcessingRequest
request = ProcessingRequest(
operation="query.selector",
input={"selector": "sections[heading=Decision]"},
context=ProcessingContext(
source_path=Path("docs/adr.md"),
namespaces={"std": "standards"},
variables={"audience": "internal"},
),
options={"format": "json"},
scope="document",
)
The request cache key includes operation, input, options, scope, declared capabilities, metadata, and stable context semantics:
- source path
- namespaces
- variables
- policy
- metadata
It intentionally excludes root path, caller name, and live backend handles. Those are execution-environment details. If they matter semantically for an extension, put explicit values in request options or metadata.
Processing Result Example
from markitect_tool.extension import (
ProcessingProvenance,
ProcessingResult,
ProcessingTrace,
)
result = ProcessingResult(
output={"count": 1},
provenance=[
ProcessingProvenance(
operation="query.selector",
source_path="docs/adr.md",
content_hash="sha256:...",
)
],
).with_trace(ProcessingTrace(event="query.done"))
ProcessingResult.valid is derived from diagnostics. Any diagnostic with
severity error makes the result invalid.
Registration Strategy
Start with in-package registration:
markitect_tool/extensions/
query_selector.py
query_jsonpath.py
backend_local_sqlite.py
processors_builtin.py
Each module exposes one or more descriptors plus a registration function. The root registry can be assembled explicitly at import time or by a small internal discovery list. Package entry points can be added later if external extension packages become a real requirement.
See docs/extension-authoring.md for the extension authoring checklist and
descriptor template.
Registry Use
Extension registries are optimized for common lookup patterns:
registry.get("backend.local-sqlite")registry.list(kind="query-engine")registry.require_capability("fts")registry.check_dependencies("jsonpath")
Kinds and capabilities are indexed at registration time, so large registries can avoid repeated full scans for basic discovery.
Execution Lifecycle
ExtensionExecutor wraps a descriptor factory with deterministic lifecycle
hooks:
- Fetch descriptor.
- Check required optional dependencies.
- Instantiate callable implementation.
- Run
beforecallbacks. - Execute implementation.
- Normalize result type.
- Append
extension.executedtrace. - Run success or failure callbacks.
- Run final
aftercallbacks.
Callbacks are explicit. The framework does not introduce hidden global behavior.
Compatibility Rules
The refactor must preserve:
- current library APIs such as
query_document - current CLI commands and output envelopes
- current diagnostic codes where users may rely on them
- current provenance operation strings unless intentionally deprecated
- optional dependency behavior for JSONPath and future adapters
- cache/index file compatibility unless a migration is documented
The first implementation adds canonical processing envelopes, extension descriptors, registries, lifecycle callbacks, query-engine registry shims, built-in extension descriptors, and CLI command specs while preserving existing public commands.
Characterization Coverage
Before refactoring, lock down:
- selector query and extraction
- optional JSONPath diagnostics
- processor registry behavior and provenance
- backend manifest registry and capability checks
- local SQLite snapshot/index/search behavior
- content reference resolution
- representative CLI command envelopes
- provenance and diagnostic shapes
These tests are deliberately a little redundant with unit tests. Their job is to protect the current public behavior while internals move behind extension descriptors and registries.