Files
markitect-tool/docs/internal-extension-framework.md

9.0 KiB

Internal Extension Framework

Purpose

Markitect has reached the point where optional features are useful but are starting to concentrate wiring in central modules. Query engines, processors, backend stores, references, contract checks, templates, generation adapters, and CLI commands all need some combination of registration, capability metadata, diagnostics, provenance, and optional dependency handling.

The internal extension framework should make those seams explicit without turning the project into a heavy external plugin platform.

Boundary

This framework is about internal extensibility:

feature descriptor -> registry -> processing request/context/result
                   -> diagnostics/provenance/capabilities
                   -> CLI/API/backend integration

It is not the same as MKTT-WP-0011 dataflow workflows. Workflows organize business-facing processing steps for a document pipeline. The extension framework organizes how Markitect itself exposes and composes capabilities.

Extension Taxonomy

Kind Examples Primary Contract
query-engine selector, JSONPath document/data in, matches out
processor identity, uppercase, include fenced block in, processed result out
backend local SQLite index snapshots/index/search storage
reference-provider section, region, fence, line address in, content units out
validator schema, contract, section assertion document/context in, diagnostics out
runtime context loader, form state, dynamic rules document/contract/context in, diagnostics and state out
assessment-runner provider-neutral rubric execution assessment request in, normalized result out
policy-gateway local label gateway, future external auth adapters subject/action/object in, decision or filtered results out
template-engine deterministic templates template/data in, Markdown out
generation-adapter provider-neutral assisted generation request in, generated candidate out
source-adapter EPUB3/PDF/DOCX adapters in external packages source asset in, normalized Markdown out
cli-group cache, backend, ref, class command descriptors or registration hook
render-export Quarkdown/export adapters Markdown source in, rendered/exported artifact descriptor out
render-reference-contract render units, cross-references, TOC, asset manifests passive manifest in, renderer-planning metadata out
document-function future function layer function call in, typed document value out

Canonical Lifecycle

An extension should be describable before it runs:

  1. Register descriptor.
  2. Check optional dependencies.
  3. Check capabilities and policy labels.
  4. Build processing context.
  5. Execute operation.
  6. Normalize result, diagnostics, provenance, and trace data.
  7. Expose output through library API, CLI, backend, or workflow layer.

The framework should allow deterministic extensions to stay simple. Assisted, external, networked, or filesystem-mutating extensions should declare that explicitly before execution.

Descriptor Shape

The first descriptor model should cover:

  • stable id
  • kind
  • version
  • summary
  • implementation reference
  • capability declarations
  • optional dependency declarations
  • safety flags
  • input and output contract names
  • diagnostics namespace
  • provenance operation prefix
  • documentation and example links
  • CLI affordances where applicable

Descriptors are not meant to replace implementation modules. They are the small declarative surface that lets Markitect inspect, list, validate, and compose capabilities consistently.

Processing Model

The canonical processing model should define a small set of shared envelopes:

  • ProcessingRequest: operation id, input payload, options, scope
  • ProcessingContext: root, source path, namespaces, variables, policy, backend handles, and caller metadata
  • ProcessingResult: output payload, diagnostics, provenance, dependencies, trace events, and validity
  • ProcessingDiagnostic: severity, code, message, source, help
  • ProcessingCapability: declared feature or permission requirement
  • ProcessingProvenance: operation, source identity, snapshot/content hashes, dependencies, backend/provider metadata

Subsystem-specific types may remain richer. The canonical model is the bridge, not a forced replacement for every local dataclass.

Processing Request Example

from pathlib import Path

from markitect_tool.extension import ProcessingContext, ProcessingRequest

request = ProcessingRequest(
    operation="query.selector",
    input={"selector": "sections[heading=Decision]"},
    context=ProcessingContext(
        source_path=Path("docs/adr.md"),
        namespaces={"std": "standards"},
        variables={"audience": "internal"},
    ),
    options={"format": "json"},
    scope="document",
)

The request cache key includes operation, input, options, scope, declared capabilities, metadata, and stable context semantics:

  • source path
  • namespaces
  • variables
  • policy
  • metadata

It intentionally excludes root path, caller name, and live backend handles. Those are execution-environment details. If they matter semantically for an extension, put explicit values in request options or metadata.

Processing Result Example

from markitect_tool.extension import (
    ProcessingProvenance,
    ProcessingResult,
    ProcessingTrace,
)

result = ProcessingResult(
    output={"count": 1},
    provenance=[
        ProcessingProvenance(
            operation="query.selector",
            source_path="docs/adr.md",
            content_hash="sha256:...",
        )
    ],
).with_trace(ProcessingTrace(event="query.done"))

ProcessingResult.valid is derived from diagnostics. Any diagnostic with severity error makes the result invalid.

Registration Strategy

Start with in-package registration:

markitect_tool/extensions/
  query_selector.py
  query_jsonpath.py
  backend_local_sqlite.py
  processors_builtin.py

Each module exposes one or more descriptors plus a registration function. The root registry can be assembled explicitly at import time or by a small internal discovery list. Source adapters are the first external package-discovery slice and use the markitect_tool.source_adapters entry point group defined in docs/source-adapter-contract.md. Render/export adapters use markitect_tool.render_export_adapters and keep concrete renderer execution in external packages. Render reference and asset manifests remain built-in passive contracts; they do not need adapter discovery.

See docs/extension-authoring.md for the extension authoring checklist and descriptor template.

Registry Use

Extension registries are optimized for common lookup patterns:

  • registry.get("backend.local-sqlite")
  • registry.list(kind="query-engine")
  • registry.require_capability("fts")
  • registry.check_dependencies("jsonpath")

Kinds and capabilities are indexed at registration time, so large registries can avoid repeated full scans for basic discovery.

The same registry is exposed on the CLI for practical discovery:

mkt extension list
mkt extension inspect memory.context-package
mkt extension commands

This makes the extension catalog part of the user-visible contract. When a capability gains a CLI command, the descriptor should declare that command so generated docs and audits can compare intent with the live surface.

Execution Lifecycle

ExtensionExecutor wraps a descriptor factory with deterministic lifecycle hooks:

  1. Fetch descriptor.
  2. Check required optional dependencies.
  3. Instantiate callable implementation.
  4. Run before callbacks.
  5. Execute implementation.
  6. Normalize result type.
  7. Append extension.executed trace.
  8. Run success or failure callbacks.
  9. Run final after callbacks.

Callbacks are explicit. The framework does not introduce hidden global behavior.

Compatibility Rules

The refactor must preserve:

  • current library APIs such as query_document
  • current CLI commands and output envelopes
  • current diagnostic codes where users may rely on them
  • current provenance operation strings unless intentionally deprecated
  • optional dependency behavior for JSONPath and future adapters
  • cache/index file compatibility unless a migration is documented

The first implementation adds canonical processing envelopes, extension descriptors, registries, lifecycle callbacks, query-engine registry shims, built-in extension descriptors, and CLI command specs while preserving existing public commands.

Characterization Coverage

Before refactoring, lock down:

  • selector query and extraction
  • optional JSONPath diagnostics
  • processor registry behavior and provenance
  • backend manifest registry and capability checks
  • local SQLite snapshot/index/search behavior
  • content reference resolution
  • representative CLI command envelopes
  • provenance and diagnostic shapes

These tests are deliberately a little redundant with unit tests. Their job is to protect the current public behavior while internals move behind extension descriptors and registries.