Files
markitect-tool/docs/extension-authoring.md

6.0 KiB

Internal Extension Authoring

Purpose

This guide describes how to add a new internal Markitect extension without turning central modules into the main integration surface.

Use this for internal query engines, processors, backend/index stores, reference providers, validators, template/generation adapters, CLI command groups, render/export adapters, and future document functions.

Each extension should have:

  • implementation module
  • descriptor or descriptor factory
  • focused tests
  • characterization coverage if it changes existing behavior
  • documentation or example link
  • diagnostic namespace
  • provenance operation prefix
  • optional dependency declaration
  • capability and safety declarations

Prefer this shape:

src/markitect_tool/<area>/<feature>.py
tests/test_<area>_<feature>.py
docs/<feature>.md

If the extension is cross-cutting, register it from markitect_tool.extension.builtins or a future internal discovery module rather than importing it from many central files.

Descriptor Template

from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability


def my_extension_descriptor() -> ExtensionDescriptor:
    return ExtensionDescriptor(
        id="query.example",
        kind="query-engine",
        summary="Example query engine.",
        capabilities=[
            ProcessingCapability(id="ast", kind="read"),
        ],
        input_contract="Document + example expression",
        output_contract="QueryMatch[]",
        diagnostics_namespace="query.example",
        provenance_prefix="query.example",
        cli={"commands": ["mkt query --engine example"]},
        docs=["docs/example-query.md"],
        examples=["examples/query/example.md"],
    )

Optional Dependencies

Declare optional dependencies in descriptors:

from markitect_tool.extension import OptionalDependency

OptionalDependency(
    name="jsonpath_ng",
    package="jsonpath-ng",
    extra="query",
    required=True,
    purpose="Evaluate JSONPath expressions.",
)

If a dependency is missing, return a structured diagnostic. Do not fail with an unexplained import error.

Processing Envelopes

Use canonical processing envelopes where an extension needs a shared execution boundary:

  • ProcessingRequest
  • ProcessingContext
  • ProcessingResult
  • ProcessingCapability
  • ProcessingProvenance
  • ProcessingTrace

Subsystem-specific dataclasses may remain richer. The canonical model is the bridge that lets callbacks, registries, diagnostics, provenance, and future policy checks interact consistently.

Minimal Runnable Extension

from markitect_tool.extension import (
    ExtensionDescriptor,
    ExtensionExecutor,
    ExtensionRegistry,
    ProcessingRequest,
    ProcessingResult,
)


def run_example(request: ProcessingRequest) -> ProcessingResult:
    name = request.input.get("name", "world")
    return ProcessingResult(output=f"Hello, {name}")


descriptor = ExtensionDescriptor(
    id="example.hello",
    kind="example",
    summary="Small example extension.",
    factory=lambda: run_example,
)

registry = ExtensionRegistry([descriptor])
result = ExtensionExecutor(registry).execute(
    "example.hello",
    ProcessingRequest(operation="example.hello", input={"name": "Markitect"}),
)

Use this executor boundary when callbacks, dependency checks, trace events, or future policy checks matter. For tiny deterministic helpers, it is still fine to keep the existing direct function API and expose a descriptor alongside it.

Cache-Key Rules

ProcessingRequest.cache_key includes:

  • operation
  • input
  • stable context material
  • options
  • scope
  • declared capabilities
  • request metadata

Stable context material includes source path, namespaces, variables, policy, and metadata. It does not include workspace root, caller, or live backend handles. This keeps cache keys portable while avoiding collisions for context-sensitive operations.

Diagnostics

Diagnostics should be:

  • stable enough for tests and callers
  • namespaced by subsystem or extension
  • explicit about optional dependency failures
  • tied to source locations where possible
  • emitted as Diagnostic or ProcessingResult.from_error

Recommended code style:

<extension-kind>.<condition>
query.invalid_jsonpath
processor.unknown
extension.missing_dependency
backend.local_sqlite.invalid_fts_query

Provenance

Every extension that transforms, queries, reads, writes, generates, or indexes content should expose provenance. Use a stable operation prefix:

query.selector
query.jsonpath
processor.include
local_snapshot_store.put_file

Include source path, content hash, snapshot id, backend/provider id, and dependencies when known.

Safety And Policy

Descriptors should declare safety-relevant behavior:

  • reads files
  • writes local cache
  • writes user output files
  • accesses network
  • invokes external process
  • calls assisted-generation provider
  • transmits content outside the local process

The initial framework records this metadata. Later policy layers can enforce it.

CLI Affordances

If an extension exposes CLI behavior, declare it in descriptor.cli:

cli={"commands": ["mkt cache index", "mkt search"]}

markitect_tool.cli.extensions.collect_cli_command_specs() can inspect these affordances without importing Click command implementations.

Testing Checklist

Add tests for:

  • descriptor serialization
  • registry lookup and duplicate handling
  • missing optional dependency diagnostics
  • canonical result validity
  • provenance shape
  • CLI output envelope if public commands are exposed
  • compatibility shim if replacing an existing API

When refactoring an existing feature, add characterization tests first, then migrate implementation behind descriptors or registries.

Boundary With Workflows

Internal extensions describe what Markitect can do. Workflows describe how a user combines capabilities for a concrete document pipeline.

An extension may expose a workflow step later, but it should not depend on the workflow engine to be useful from the library or CLI.