markitect-tool/docs/extension-authoring.md

# Internal Extension Authoring

## Purpose

This guide describes how to add a new internal Markitect extension without
turning central modules into the main integration surface.

Use this for internal query engines, processors, backend/index stores,
reference providers, validators, template/generation adapters, CLI command
groups, render/export adapters, and future document functions.

Source-format adapters are external package extensions. Use
`docs/source-adapter-contract.md` for the source adapter protocol, entry point
group, descriptor shape, and contract-test expectations.

## Recommended Shape

Each extension should have:

- implementation module
- descriptor or descriptor factory
- focused tests
- characterization coverage if it changes existing behavior
- documentation or example link
- diagnostic namespace
- provenance operation prefix
- optional dependency declaration
- capability and safety declarations

Prefer this shape:

```text
src/markitect_tool/<area>/<feature>.py
tests/test_<area>_<feature>.py
docs/<feature>.md
```

If the extension is cross-cutting, register it from
`markitect_tool.extension.builtins` or a future internal discovery module rather
than importing it from many central files.

## Descriptor Template

```python
from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability


def my_extension_descriptor() -> ExtensionDescriptor:
    return ExtensionDescriptor(
        id="query.example",
        kind="query-engine",
        summary="Example query engine.",
        capabilities=[
            ProcessingCapability(id="ast", kind="read"),
        ],
        input_contract="Document + example expression",
        output_contract="QueryMatch[]",
        diagnostics_namespace="query.example",
        provenance_prefix="query.example",
        cli={"commands": ["mkt query --engine example"]},
        docs=["docs/example-query.md"],
        examples=["examples/query/example.md"],
    )
```

## Optional Dependencies

Declare optional dependencies in descriptors:

```python
from markitect_tool.extension import OptionalDependency

OptionalDependency(
    name="jsonpath_ng",
    package="jsonpath-ng",
    extra="query",
    required=True,
    purpose="Evaluate JSONPath expressions.",
)
```

If a dependency is missing, return a structured diagnostic. Do not fail with an
unexplained import error.

## Processing Envelopes

Use canonical processing envelopes where an extension needs a shared execution
boundary:

- `ProcessingRequest`
- `ProcessingContext`
- `ProcessingResult`
- `ProcessingCapability`
- `ProcessingProvenance`
- `ProcessingTrace`

Subsystem-specific dataclasses may remain richer. The canonical model is the
bridge that lets callbacks, registries, diagnostics, provenance, and future
policy checks interact consistently.

### Minimal Runnable Extension

```python
from markitect_tool.extension import (
    ExtensionDescriptor,
    ExtensionExecutor,
    ExtensionRegistry,
    ProcessingRequest,
    ProcessingResult,
)


def run_example(request: ProcessingRequest) -> ProcessingResult:
    name = request.input.get("name", "world")
    return ProcessingResult(output=f"Hello, {name}")


descriptor = ExtensionDescriptor(
    id="example.hello",
    kind="example",
    summary="Small example extension.",
    factory=lambda: run_example,
)

registry = ExtensionRegistry([descriptor])
result = ExtensionExecutor(registry).execute(
    "example.hello",
    ProcessingRequest(operation="example.hello", input={"name": "Markitect"}),
)
```

Use this executor boundary when callbacks, dependency checks, trace events, or
future policy checks matter. For tiny deterministic helpers, it is still fine to
keep the existing direct function API and expose a descriptor alongside it.

### Cache-Key Rules

`ProcessingRequest.cache_key` includes:

- operation
- input
- stable context material
- options
- scope
- declared capabilities
- request metadata

Stable context material includes source path, namespaces, variables, policy, and
metadata. It does not include workspace root, caller, or live backend handles.
This keeps cache keys portable while avoiding collisions for context-sensitive
operations.

## Diagnostics

Diagnostics should be:

- stable enough for tests and callers
- namespaced by subsystem or extension
- explicit about optional dependency failures
- tied to source locations where possible
- emitted as `Diagnostic` or `ProcessingResult.from_error`

Recommended code style:

```text
<extension-kind>.<condition>
query.invalid_jsonpath
processor.unknown
extension.missing_dependency
backend.local_sqlite.invalid_fts_query
```

## Provenance

Every extension that transforms, queries, reads, writes, generates, or indexes
content should expose provenance. Use a stable operation prefix:

```text
query.selector
query.jsonpath
processor.include
local_snapshot_store.put_file
```

Include source path, content hash, snapshot id, backend/provider id, and
dependencies when known.

## Safety And Policy

Descriptors should declare safety-relevant behavior:

- reads files
- writes local cache
- writes user output files
- accesses network
- invokes external process
- calls assisted-generation provider
- transmits content outside the local process

The initial framework records this metadata. Later policy layers can enforce it.

## CLI Affordances

If an extension exposes CLI behavior, declare it in `descriptor.cli`:

```python
cli={"commands": ["mkt cache index", "mkt search"]}
```

`markitect_tool.cli.extensions.collect_cli_command_specs()` can inspect these
affordances without importing Click command implementations.

## Testing Checklist

Add tests for:

- descriptor serialization
- registry lookup and duplicate handling
- missing optional dependency diagnostics
- canonical result validity
- provenance shape
- CLI output envelope if public commands are exposed
- compatibility shim if replacing an existing API

When refactoring an existing feature, add characterization tests first, then
migrate implementation behind descriptors or registries.

## Boundary With Workflows

Internal extensions describe what Markitect can do. Workflows describe how a
user combines capabilities for a concrete document pipeline.

An extension may expose a workflow step later, but it should not depend on the
workflow engine to be useful from the library or CLI.