Extensible canonical internal processing refactoring

This commit is contained in:
2026-05-04 11:06:11 +02:00
parent 4a16ccf1e1
commit d977f9e67c
20 changed files with 1815 additions and 16 deletions

178
docs/extension-authoring.md Normal file
View File

@@ -0,0 +1,178 @@
# Internal Extension Authoring
## Purpose
This guide describes how to add a new internal Markitect extension without
turning central modules into the main integration surface.
Use this for internal query engines, processors, backend/index stores,
reference providers, validators, template/generation adapters, CLI command
groups, render/export adapters, and future document functions.
## Recommended Shape
Each extension should have:
- implementation module
- descriptor or descriptor factory
- focused tests
- characterization coverage if it changes existing behavior
- documentation or example link
- diagnostic namespace
- provenance operation prefix
- optional dependency declaration
- capability and safety declarations
Prefer this shape:
```text
src/markitect_tool/<area>/<feature>.py
tests/test_<area>_<feature>.py
docs/<feature>.md
```
If the extension is cross-cutting, register it from
`markitect_tool.extension.builtins` or a future internal discovery module rather
than importing it from many central files.
## Descriptor Template
```python
from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability
def my_extension_descriptor() -> ExtensionDescriptor:
return ExtensionDescriptor(
id="query.example",
kind="query-engine",
summary="Example query engine.",
capabilities=[
ProcessingCapability(id="ast", kind="read"),
],
input_contract="Document + example expression",
output_contract="QueryMatch[]",
diagnostics_namespace="query.example",
provenance_prefix="query.example",
cli={"commands": ["mkt query --engine example"]},
docs=["docs/example-query.md"],
examples=["examples/query/example.md"],
)
```
## Optional Dependencies
Declare optional dependencies in descriptors:
```python
from markitect_tool.extension import OptionalDependency
OptionalDependency(
name="jsonpath_ng",
package="jsonpath-ng",
extra="query",
required=True,
purpose="Evaluate JSONPath expressions.",
)
```
If a dependency is missing, return a structured diagnostic. Do not fail with an
unexplained import error.
## Processing Envelopes
Use canonical processing envelopes where an extension needs a shared execution
boundary:
- `ProcessingRequest`
- `ProcessingContext`
- `ProcessingResult`
- `ProcessingCapability`
- `ProcessingProvenance`
- `ProcessingTrace`
Subsystem-specific dataclasses may remain richer. The canonical model is the
bridge that lets callbacks, registries, diagnostics, provenance, and future
policy checks interact consistently.
## Diagnostics
Diagnostics should be:
- stable enough for tests and callers
- namespaced by subsystem or extension
- explicit about optional dependency failures
- tied to source locations where possible
- emitted as `Diagnostic` or `ProcessingResult.from_error`
Recommended code style:
```text
<extension-kind>.<condition>
query.invalid_jsonpath
processor.unknown
extension.missing_dependency
backend.local_sqlite.invalid_fts_query
```
## Provenance
Every extension that transforms, queries, reads, writes, generates, or indexes
content should expose provenance. Use a stable operation prefix:
```text
query.selector
query.jsonpath
processor.include
local_snapshot_store.put_file
```
Include source path, content hash, snapshot id, backend/provider id, and
dependencies when known.
## Safety And Policy
Descriptors should declare safety-relevant behavior:
- reads files
- writes local cache
- writes user output files
- accesses network
- invokes external process
- calls assisted-generation provider
- transmits content outside the local process
The initial framework records this metadata. Later policy layers can enforce it.
## CLI Affordances
If an extension exposes CLI behavior, declare it in `descriptor.cli`:
```python
cli={"commands": ["mkt cache index", "mkt search"]}
```
`markitect_tool.cli.extensions.collect_cli_command_specs()` can inspect these
affordances without importing Click command implementations.
## Testing Checklist
Add tests for:
- descriptor serialization
- registry lookup and duplicate handling
- missing optional dependency diagnostics
- canonical result validity
- provenance shape
- CLI output envelope if public commands are exposed
- compatibility shim if replacing an existing API
When refactoring an existing feature, add characterization tests first, then
migrate implementation behind descriptors or registries.
## Boundary With Workflows
Internal extensions describe what Markitect can do. Workflows describe how a
user combines capabilities for a concrete document pipeline.
An extension may expose a workflow step later, but it should not depend on the
workflow engine to be useful from the library or CLI.

View File

@@ -0,0 +1,149 @@
# Internal Extension Framework
## Purpose
Markitect has reached the point where optional features are useful but are
starting to concentrate wiring in central modules. Query engines, processors,
backend stores, references, contract checks, templates, generation adapters, and
CLI commands all need some combination of registration, capability metadata,
diagnostics, provenance, and optional dependency handling.
The internal extension framework should make those seams explicit without
turning the project into a heavy external plugin platform.
## Boundary
This framework is about internal extensibility:
```text
feature descriptor -> registry -> processing request/context/result
-> diagnostics/provenance/capabilities
-> CLI/API/backend integration
```
It is not the same as `MKTT-WP-0011` dataflow workflows. Workflows organize
business-facing processing steps for a document pipeline. The extension
framework organizes how Markitect itself exposes and composes capabilities.
## Extension Taxonomy
| Kind | Examples | Primary Contract |
| --- | --- | --- |
| `query-engine` | selector, JSONPath | document/data in, matches out |
| `processor` | identity, uppercase, include | fenced block in, processed result out |
| `backend` | local SQLite index | snapshots/index/search storage |
| `reference-provider` | section, region, fence, line | address in, content units out |
| `validator` | schema, contract, section assertion | document/context in, diagnostics out |
| `template-engine` | deterministic templates | template/data in, Markdown out |
| `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out |
| `cli-group` | cache, backend, ref, class | command descriptors or registration hook |
| `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
| `document-function` | future function layer | function call in, typed document value out |
## Canonical Lifecycle
An extension should be describable before it runs:
1. Register descriptor.
2. Check optional dependencies.
3. Check capabilities and policy labels.
4. Build processing context.
5. Execute operation.
6. Normalize result, diagnostics, provenance, and trace data.
7. Expose output through library API, CLI, backend, or workflow layer.
The framework should allow deterministic extensions to stay simple. Assisted,
external, networked, or filesystem-mutating extensions should declare that
explicitly before execution.
## Descriptor Shape
The first descriptor model should cover:
- stable id
- kind
- version
- summary
- implementation reference
- capability declarations
- optional dependency declarations
- safety flags
- input and output contract names
- diagnostics namespace
- provenance operation prefix
- documentation and example links
- CLI affordances where applicable
Descriptors are not meant to replace implementation modules. They are the small
declarative surface that lets Markitect inspect, list, validate, and compose
capabilities consistently.
## Processing Model
The canonical processing model should define a small set of shared envelopes:
- `ProcessingRequest`: operation id, input payload, options, scope
- `ProcessingContext`: root, source path, namespaces, variables, policy, backend
handles, and caller metadata
- `ProcessingResult`: output payload, diagnostics, provenance, dependencies,
trace events, and validity
- `ProcessingDiagnostic`: severity, code, message, source, help
- `ProcessingCapability`: declared feature or permission requirement
- `ProcessingProvenance`: operation, source identity, snapshot/content hashes,
dependencies, backend/provider metadata
Subsystem-specific types may remain richer. The canonical model is the bridge,
not a forced replacement for every local dataclass.
## Registration Strategy
Start with in-package registration:
```text
markitect_tool/extensions/
query_selector.py
query_jsonpath.py
backend_local_sqlite.py
processors_builtin.py
```
Each module exposes one or more descriptors plus a registration function. The
root registry can be assembled explicitly at import time or by a small internal
discovery list. Package entry points can be added later if external extension
packages become a real requirement.
See `docs/extension-authoring.md` for the extension authoring checklist and
descriptor template.
## Compatibility Rules
The refactor must preserve:
- current library APIs such as `query_document`
- current CLI commands and output envelopes
- current diagnostic codes where users may rely on them
- current provenance operation strings unless intentionally deprecated
- optional dependency behavior for JSONPath and future adapters
- cache/index file compatibility unless a migration is documented
The first implementation adds canonical processing envelopes, extension
descriptors, registries, lifecycle callbacks, query-engine registry shims,
built-in extension descriptors, and CLI command specs while preserving existing
public commands.
## Characterization Coverage
Before refactoring, lock down:
- selector query and extraction
- optional JSONPath diagnostics
- processor registry behavior and provenance
- backend manifest registry and capability checks
- local SQLite snapshot/index/search behavior
- content reference resolution
- representative CLI command envelopes
- provenance and diagnostic shapes
These tests are deliberately a little redundant with unit tests. Their job is
to protect the current public behavior while internals move behind extension
descriptors and registries.

View File

@@ -34,7 +34,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
| `MKTT-WP-0007` | complete | done | `MKTT-WP-0006` | Advanced query and local index backend is complete: AST inspection, optional JSONPath, SQLite snapshots/metadata, FTS5 search, incremental refresh, and local index CLI. |
| `MKTT-WP-0013` | P1 | todo | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework and canonical processing model: characterize current behavior, add registries/descriptors/callbacks, and reduce central wiring before heavier runtime/workflow work. |
| `MKTT-WP-0013` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework is complete: characterization tests, canonical processing model, descriptors, registries, lifecycle callbacks, query-engine registry, built-in extension catalog, CLI command specs, and authoring guide. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |

View File

@@ -0,0 +1,64 @@
"""CLI extension specifications derived from internal extension descriptors."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any
from markitect_tool.extension import ExtensionDescriptor, ExtensionRegistry
@dataclass(frozen=True)
class CliCommandSpec:
"""Inspectable command affordance declared by an extension."""
command: str
extension_id: str
kind: str
summary: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
data = {
"command": self.command,
"extension_id": self.extension_id,
"kind": self.kind,
"summary": self.summary,
"metadata": self.metadata,
}
return {
key: value
for key, value in data.items()
if value not in (None, {}, [])
}
def command_specs_from_extension(descriptor: ExtensionDescriptor) -> list[CliCommandSpec]:
"""Return CLI command specs declared by one extension descriptor."""
raw_commands = descriptor.cli.get("commands", [])
if isinstance(raw_commands, str):
raw_commands = [raw_commands]
return [
CliCommandSpec(
command=str(command),
extension_id=descriptor.id,
kind=descriptor.kind,
summary=descriptor.summary,
metadata={
key: value
for key, value in descriptor.cli.items()
if key != "commands"
},
)
for command in raw_commands
]
def collect_cli_command_specs(registry: ExtensionRegistry) -> list[CliCommandSpec]:
"""Collect CLI affordances from a registry of extension descriptors."""
specs: list[CliCommandSpec] = []
for descriptor in registry.list():
specs.extend(command_specs_from_extension(descriptor))
return sorted(specs, key=lambda spec: (spec.command, spec.extension_id))

View File

@@ -0,0 +1,56 @@
"""Internal extension framework primitives."""
from markitect_tool.extension.processing import (
ProcessingCapability,
ProcessingContext,
ProcessingDiagnostic,
ProcessingProvenance,
ProcessingRequest,
ProcessingResult,
ProcessingTrace,
)
from markitect_tool.extension.execution import (
AfterCallback,
BeforeCallback,
ExtensionExecutor,
ExtensionLifecycle,
ExtensionRunner,
)
from markitect_tool.extension.registry import (
ExtensionDependencyCheck,
ExtensionDescriptor,
ExtensionRegistry,
ExtensionRegistryError,
OptionalDependency,
)
__all__ = [
"ProcessingCapability",
"ProcessingContext",
"ProcessingDiagnostic",
"ProcessingProvenance",
"ProcessingRequest",
"ProcessingResult",
"ProcessingTrace",
"ExtensionDependencyCheck",
"ExtensionDescriptor",
"ExtensionRegistry",
"ExtensionRegistryError",
"OptionalDependency",
"AfterCallback",
"BeforeCallback",
"ExtensionExecutor",
"ExtensionLifecycle",
"ExtensionRunner",
]
def builtin_extension_registry():
"""Return built-in extension descriptors without import-cycle pressure."""
from markitect_tool.extension.builtins import builtin_extension_registry as _registry
return _registry()
__all__.append("builtin_extension_registry")

View File

@@ -0,0 +1,92 @@
"""Built-in internal extension descriptors."""
from __future__ import annotations
from markitect_tool.extension.registry import ExtensionDescriptor, ExtensionRegistry
from markitect_tool.extension.processing import ProcessingCapability
from markitect_tool.query import default_query_engine_registry
def builtin_extension_registry() -> ExtensionRegistry:
"""Return descriptors for built-in Markitect extensions."""
registry = default_query_engine_registry().extension_registry()
for descriptor in _processor_descriptors() + [_local_sqlite_backend_descriptor()]:
registry.register(descriptor)
return registry
def _processor_descriptors() -> list[ExtensionDescriptor]:
return [
ExtensionDescriptor(
id="processor.identity",
kind="processor",
summary="Return fenced block content unchanged.",
capabilities=[
ProcessingCapability(id="processor", kind="execute"),
ProcessingCapability(id="deterministic", kind="execution"),
],
input_contract="ProcessorRequest",
output_contract="ProcessorResult",
diagnostics_namespace="processor",
provenance_prefix="processor.identity",
cli={"commands": ["mkt process"]},
docs=["docs/processors.md"],
),
ExtensionDescriptor(
id="processor.uppercase",
kind="processor",
summary="Uppercase fenced block content deterministically.",
capabilities=[
ProcessingCapability(id="processor", kind="execute"),
ProcessingCapability(id="deterministic", kind="execution"),
],
input_contract="ProcessorRequest",
output_contract="ProcessorResult",
diagnostics_namespace="processor",
provenance_prefix="processor.uppercase",
cli={"commands": ["mkt process"]},
docs=["docs/processors.md"],
),
ExtensionDescriptor(
id="processor.include",
kind="processor",
summary="Resolve a content reference into fenced block output.",
capabilities=[
ProcessingCapability(id="processor", kind="execute"),
ProcessingCapability(id="references", kind="read"),
ProcessingCapability(id="filesystem", kind="read"),
],
safety={"reads_files": True, "writes_files": False, "network": False},
input_contract="ProcessorRequest",
output_contract="ProcessorResult",
diagnostics_namespace="processor",
provenance_prefix="processor.include",
cli={"commands": ["mkt process"]},
docs=["docs/processors.md", "docs/content-references.md"],
),
]
def _local_sqlite_backend_descriptor() -> ExtensionDescriptor:
return ExtensionDescriptor(
id="backend.local-sqlite",
kind="backend",
summary="Local SQLite snapshot, metadata, JSON, and FTS5 index backend.",
capabilities=[
ProcessingCapability(id="snapshots", kind="backend"),
ProcessingCapability(id="ast", kind="backend"),
ProcessingCapability(id="json", kind="backend"),
ProcessingCapability(id="fts", kind="backend"),
ProcessingCapability(id="sql", kind="backend"),
ProcessingCapability(id="provenance", kind="backend"),
],
safety={"reads_files": True, "writes_local_cache": True, "network": False},
input_contract="Markdown files/directories",
output_contract="SQLite snapshot/index store",
diagnostics_namespace="backend.local_sqlite",
provenance_prefix="local_snapshot_store",
cli={"commands": ["mkt cache init", "mkt cache index", "mkt cache query", "mkt search"]},
docs=["docs/local-index-backend.md", "docs/backend-fabric.md"],
examples=["examples/backends/local-sqlite-backend.md"],
)

View File

@@ -0,0 +1,98 @@
"""Execution lifecycle for internal extensions."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Callable
from markitect_tool.extension.processing import (
ProcessingRequest,
ProcessingResult,
ProcessingTrace,
)
from markitect_tool.extension.registry import (
ExtensionDescriptor,
ExtensionRegistry,
ExtensionRegistryError,
)
ExtensionRunner = Callable[[ProcessingRequest], ProcessingResult]
BeforeCallback = Callable[[ExtensionDescriptor, ProcessingRequest], None]
AfterCallback = Callable[[ExtensionDescriptor, ProcessingRequest, ProcessingResult], None]
@dataclass
class ExtensionLifecycle:
"""Explicit callbacks around extension execution."""
before: list[BeforeCallback] = field(default_factory=list)
after_success: list[AfterCallback] = field(default_factory=list)
after_failure: list[AfterCallback] = field(default_factory=list)
after: list[AfterCallback] = field(default_factory=list)
def on_before(self, callback: BeforeCallback) -> None:
self.before.append(callback)
def on_success(self, callback: AfterCallback) -> None:
self.after_success.append(callback)
def on_failure(self, callback: AfterCallback) -> None:
self.after_failure.append(callback)
def on_after(self, callback: AfterCallback) -> None:
self.after.append(callback)
class ExtensionExecutor:
"""Execute registered extensions with deterministic lifecycle callbacks."""
def __init__(
self,
registry: ExtensionRegistry,
*,
lifecycle: ExtensionLifecycle | None = None,
) -> None:
self.registry = registry
self.lifecycle = lifecycle or ExtensionLifecycle()
def execute(self, extension_id: str, request: ProcessingRequest) -> ProcessingResult:
descriptor = self.registry.get(extension_id)
dependency_check = self.registry.check_dependencies(extension_id)
if not dependency_check.compatible:
return ProcessingResult.from_error(
code="extension.missing_dependency",
message=f"Extension `{extension_id}` is missing required dependencies.",
details=dependency_check.to_dict(),
)
runner = descriptor.instantiate()
if not callable(runner):
raise ExtensionRegistryError(f"Extension `{extension_id}` factory did not return a callable")
for callback in self.lifecycle.before:
callback(descriptor, request)
result = runner(request)
if not isinstance(result, ProcessingResult):
raise ExtensionRegistryError(
f"Extension `{extension_id}` returned {type(result).__name__}, expected ProcessingResult"
)
result = _with_trace(result, ProcessingTrace(event="extension.executed", metadata={"id": extension_id}))
callbacks = self.lifecycle.after_success if result.valid else self.lifecycle.after_failure
for callback in callbacks:
callback(descriptor, request, result)
for callback in self.lifecycle.after:
callback(descriptor, request, result)
return result
def _with_trace(result: ProcessingResult, trace: ProcessingTrace) -> ProcessingResult:
return ProcessingResult(
output=result.output,
diagnostics=result.diagnostics,
provenance=result.provenance,
dependencies=result.dependencies,
trace=[*result.trace, trace],
metadata=result.metadata,
)

View File

@@ -0,0 +1,184 @@
"""Canonical processing envelopes for internal extensions."""
from __future__ import annotations
import hashlib
import json
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any
from markitect_tool.diagnostics import Diagnostic, SourceLocation, has_error
@dataclass(frozen=True)
class ProcessingCapability:
"""A declared capability or permission needed by an extension."""
id: str
kind: str = "feature"
required: bool = True
description: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return _drop_empty(asdict(self))
@dataclass(frozen=True)
class ProcessingProvenance:
"""Cross-extension provenance envelope."""
operation: str
source_path: str | None = None
snapshot_id: str | None = None
content_hash: str | None = None
dependencies: list[str] = field(default_factory=list)
backend_id: str | None = None
provider_id: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return _drop_empty(asdict(self))
@dataclass(frozen=True)
class ProcessingTrace:
"""One optional execution trace event."""
event: str
message: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return _drop_empty(asdict(self))
@dataclass(frozen=True)
class ProcessingContext:
"""Shared execution context available to extension implementations."""
root: Path = Path(".")
source_path: Path | None = None
namespaces: dict[str, str] = field(default_factory=dict)
variables: dict[str, Any] = field(default_factory=dict)
policy: dict[str, Any] = field(default_factory=dict)
backend_handles: dict[str, Any] = field(default_factory=dict, repr=False, compare=False)
caller: str | None = None
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
data = {
"root": str(self.root),
"source_path": str(self.source_path) if self.source_path else None,
"namespaces": self.namespaces,
"variables": self.variables,
"policy": self.policy,
"backend_handles": sorted(self.backend_handles),
"caller": self.caller,
"metadata": self.metadata,
}
return _drop_empty(data)
@dataclass(frozen=True)
class ProcessingRequest:
"""Canonical request passed to an internal extension."""
operation: str
input: Any
context: ProcessingContext = field(default_factory=ProcessingContext)
options: dict[str, Any] = field(default_factory=dict)
scope: str | None = None
capabilities: list[ProcessingCapability] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
@property
def cache_key(self) -> str:
payload = {
"operation": self.operation,
"input": self.input,
"options": self.options,
"scope": self.scope,
"capabilities": [capability.to_dict() for capability in self.capabilities],
"metadata": self.metadata,
}
return "processing:" + hashlib.sha256(
json.dumps(payload, sort_keys=True, ensure_ascii=False, default=str).encode("utf-8")
).hexdigest()
def to_dict(self) -> dict[str, Any]:
data = {
"operation": self.operation,
"input": self.input,
"context": self.context.to_dict(),
"options": self.options,
"scope": self.scope,
"capabilities": [capability.to_dict() for capability in self.capabilities],
"metadata": self.metadata,
"cache_key": self.cache_key,
}
return _drop_empty(data)
@dataclass(frozen=True)
class ProcessingResult:
"""Canonical result returned by an internal extension."""
output: Any = None
diagnostics: list[Diagnostic] = field(default_factory=list)
provenance: list[ProcessingProvenance] = field(default_factory=list)
dependencies: list[str] = field(default_factory=list)
trace: list[ProcessingTrace] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
@property
def valid(self) -> bool:
return not has_error(self.diagnostics)
def to_dict(self) -> dict[str, Any]:
data = {
"valid": self.valid,
"output": self.output,
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
"provenance": [event.to_dict() for event in self.provenance],
"dependencies": self.dependencies,
"trace": [event.to_dict() for event in self.trace],
"metadata": self.metadata,
}
return _drop_empty(data)
@classmethod
def from_error(
cls,
*,
code: str,
message: str,
source_path: str | None = None,
line: int | None = None,
details: dict[str, Any] | None = None,
) -> "ProcessingResult":
return cls(
diagnostics=[
Diagnostic(
severity="error",
code=code,
message=message,
source=SourceLocation(path=source_path, line=line)
if source_path or line
else None,
details=details or {},
)
]
)
ProcessingDiagnostic = Diagnostic
def _drop_empty(data: dict[str, Any]) -> dict[str, Any]:
return {
key: value
for key, value in data.items()
if value not in (None, [], {}, "")
}

View File

@@ -0,0 +1,193 @@
"""Extension descriptors and registries."""
from __future__ import annotations
from dataclasses import asdict, dataclass, field
from typing import Any, Callable, Iterable
from markitect_tool.extension.processing import ProcessingCapability
ExtensionFactory = Callable[[], Any]
class ExtensionRegistryError(ValueError):
"""Raised when extension descriptors or registries are invalid."""
@dataclass(frozen=True)
class OptionalDependency:
"""An optional runtime dependency declared by an extension."""
name: str
package: str | None = None
extra: str | None = None
required: bool = False
purpose: str | None = None
def to_dict(self) -> dict[str, Any]:
return _drop_empty(asdict(self))
@dataclass(frozen=True)
class ExtensionDescriptor:
"""Inspectable descriptor for one internal extension."""
id: str
kind: str
version: str = "1"
summary: str | None = None
factory: ExtensionFactory | None = field(default=None, compare=False, repr=False)
capabilities: list[ProcessingCapability] = field(default_factory=list)
optional_dependencies: list[OptionalDependency] = field(default_factory=list)
safety: dict[str, Any] = field(default_factory=dict)
input_contract: str | None = None
output_contract: str | None = None
diagnostics_namespace: str | None = None
provenance_prefix: str | None = None
cli: dict[str, Any] = field(default_factory=dict)
docs: list[str] = field(default_factory=list)
examples: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
def __post_init__(self) -> None:
if not self.id.strip():
raise ExtensionRegistryError("Extension id cannot be empty")
if not self.kind.strip():
raise ExtensionRegistryError("Extension kind cannot be empty")
def to_dict(self) -> dict[str, Any]:
data = {
"id": self.id,
"kind": self.kind,
"version": self.version,
"summary": self.summary,
"capabilities": [capability.to_dict() for capability in self.capabilities],
"optional_dependencies": [
dependency.to_dict() for dependency in self.optional_dependencies
],
"safety": self.safety,
"input_contract": self.input_contract,
"output_contract": self.output_contract,
"diagnostics_namespace": self.diagnostics_namespace,
"provenance_prefix": self.provenance_prefix,
"cli": self.cli,
"docs": self.docs,
"examples": self.examples,
"metadata": self.metadata,
}
return _drop_empty(data)
def instantiate(self) -> Any:
"""Create or return the implementation for this descriptor."""
if self.factory is None:
raise ExtensionRegistryError(f"Extension `{self.id}` has no factory")
return self.factory()
@dataclass(frozen=True)
class ExtensionDependencyCheck:
"""Result of checking required extension dependencies."""
extension_id: str
missing: list[str] = field(default_factory=list)
optional_missing: list[str] = field(default_factory=list)
@property
def compatible(self) -> bool:
return not self.missing
def to_dict(self) -> dict[str, Any]:
return {
"extension_id": self.extension_id,
"compatible": self.compatible,
"missing": self.missing,
"optional_missing": self.optional_missing,
}
class ExtensionRegistry:
"""Registry of internal extension descriptors."""
def __init__(self, descriptors: Iterable[ExtensionDescriptor] | None = None) -> None:
self._descriptors: dict[str, ExtensionDescriptor] = {}
for descriptor in descriptors or []:
self.register(descriptor)
def register(self, descriptor: ExtensionDescriptor) -> None:
if descriptor.id in self._descriptors:
raise ExtensionRegistryError(f"Duplicate extension id `{descriptor.id}`")
self._descriptors[descriptor.id] = descriptor
def get(self, extension_id: str) -> ExtensionDescriptor:
try:
return self._descriptors[extension_id]
except KeyError as exc:
raise ExtensionRegistryError(f"Unknown extension `{extension_id}`") from exc
def list(self, *, kind: str | None = None) -> list[ExtensionDescriptor]:
descriptors = [self._descriptors[key] for key in sorted(self._descriptors)]
if kind is None:
return descriptors
return [descriptor for descriptor in descriptors if descriptor.kind == kind]
def require_capability(self, capability_id: str) -> list[ExtensionDescriptor]:
return [
descriptor
for descriptor in self.list()
if any(capability.id == capability_id for capability in descriptor.capabilities)
]
def check_dependencies(
self,
extension_id: str,
*,
available_modules: set[str] | None = None,
) -> ExtensionDependencyCheck:
descriptor = self.get(extension_id)
available = (
available_modules
if available_modules is not None
else _available_modules(
dependency.name for dependency in descriptor.optional_dependencies
)
)
missing: list[str] = []
optional_missing: list[str] = []
for dependency in descriptor.optional_dependencies:
if dependency.name in available:
continue
if dependency.required:
missing.append(dependency.name)
else:
optional_missing.append(dependency.name)
return ExtensionDependencyCheck(
extension_id=extension_id,
missing=missing,
optional_missing=optional_missing,
)
def to_dict(self) -> dict[str, Any]:
return {
"count": len(self._descriptors),
"extensions": [descriptor.to_dict() for descriptor in self.list()],
}
def _available_modules(module_names: Iterable[str]) -> set[str]:
import importlib.util
return {
module_name
for module_name in module_names
if importlib.util.find_spec(module_name) is not None
}
def _drop_empty(data: dict[str, Any]) -> dict[str, Any]:
return {
key: value
for key, value in data.items()
if value not in (None, [], {}, "")
}

View File

@@ -5,8 +5,15 @@ from markitect_tool.query.engine import (
QueryMatch,
extract_document,
extract_document_jsonpath,
extract_document_with_engine,
query_document,
query_document_jsonpath,
query_document_with_engine,
)
from markitect_tool.query.registry import (
QueryEngine,
QueryEngineRegistry,
default_query_engine_registry,
)
__all__ = [
@@ -14,6 +21,11 @@ __all__ = [
"QueryMatch",
"extract_document",
"extract_document_jsonpath",
"extract_document_with_engine",
"query_document",
"query_document_jsonpath",
"query_document_with_engine",
"QueryEngine",
"QueryEngineRegistry",
"default_query_engine_registry",
]

View File

@@ -44,6 +44,29 @@ class _Selector:
def query_document(document: Document, selector: str) -> list[QueryMatch]:
"""Query a parsed document with a small Markitect selector."""
return query_document_with_engine(document, selector, engine="selector")
def query_document_with_engine(
document: Document,
selector: str,
*,
engine: str = "selector",
) -> list[QueryMatch]:
"""Query a parsed document through a registered query engine."""
from markitect_tool.query.registry import default_query_engine_registry
try:
query_engine = default_query_engine_registry().get(engine)
except ValueError as exc:
raise InvalidQueryError(str(exc)) from exc
return query_engine.query(document, selector)
def _query_document_selector(document: Document, selector: str) -> list[QueryMatch]:
"""Query a parsed document with the built-in selector engine."""
parsed = _parse_selector(selector)
if parsed.target in {"document", "$", "."}:
return [QueryMatch(kind="document", path="$", value=document.to_dict())]
@@ -67,6 +90,12 @@ def query_document_jsonpath(document: Document, expression: str) -> list[QueryMa
remains dependency-light. Install ``markitect-tool[query]`` to enable it.
"""
return query_document_with_engine(document, expression, engine="jsonpath")
def _query_document_jsonpath(document: Document, expression: str) -> list[QueryMatch]:
"""Implementation for the registered optional JSONPath engine."""
try:
from jsonpath_ng.ext import parse as parse_jsonpath
except ImportError as exc: # pragma: no cover - branch depends on env deps
@@ -110,14 +139,29 @@ def extract_document(document: Document, selector: str) -> list[str]:
return extracted
def extract_document_with_engine(
document: Document,
selector: str,
*,
engine: str = "selector",
) -> list[str]:
"""Extract textual query matches through a registered query engine."""
extracted: list[str] = []
for match in query_document_with_engine(document, selector, engine=engine):
if match.text is not None:
extracted.append(match.text)
elif isinstance(match.value, str):
extracted.append(match.value)
elif isinstance(match.value, int | float | bool):
extracted.append(str(match.value))
return extracted
def extract_document_jsonpath(document: Document, expression: str) -> list[str]:
"""Extract textual JSONPath matches from a parsed document."""
extracted: list[str] = []
for match in query_document_jsonpath(document, expression):
if match.text is not None:
extracted.append(match.text)
return extracted
return extract_document_with_engine(document, expression, engine="jsonpath")
def _parse_selector(selector: str) -> _Selector:

View File

@@ -0,0 +1,105 @@
"""Query engine registry adapters."""
from __future__ import annotations
from dataclasses import dataclass
from typing import Callable
from markitect_tool.core import Document
from markitect_tool.extension import (
ExtensionDescriptor,
ExtensionRegistry,
OptionalDependency,
ProcessingCapability,
)
from markitect_tool.query.engine import QueryMatch
QueryCallable = Callable[[Document, str], list[QueryMatch]]
@dataclass(frozen=True)
class QueryEngine:
"""Registered query engine implementation."""
descriptor: ExtensionDescriptor
query: QueryCallable
class QueryEngineRegistry:
"""Registry of query engines keyed by short engine id."""
def __init__(self, engines: list[QueryEngine] | None = None) -> None:
self._engines: dict[str, QueryEngine] = {}
for engine in engines or []:
self.register(engine)
def register(self, engine: QueryEngine) -> None:
if engine.descriptor.id in self._engines:
raise ValueError(f"Duplicate query engine `{engine.descriptor.id}`")
self._engines[engine.descriptor.id] = engine
def get(self, engine_id: str) -> QueryEngine:
try:
return self._engines[engine_id]
except KeyError as exc:
raise ValueError(f"Unknown query engine `{engine_id}`") from exc
def list(self) -> list[QueryEngine]:
return [self._engines[key] for key in sorted(self._engines)]
def extension_registry(self) -> ExtensionRegistry:
return ExtensionRegistry(engine.descriptor for engine in self.list())
def default_query_engine_registry() -> QueryEngineRegistry:
"""Return the built-in query engine registry."""
from markitect_tool.query.engine import (
_query_document_jsonpath,
_query_document_selector,
)
return QueryEngineRegistry(
[
QueryEngine(
descriptor=ExtensionDescriptor(
id="selector",
kind="query-engine",
summary="Compact Markitect selector engine.",
capabilities=[ProcessingCapability(id="ast", kind="read")],
input_contract="Document + selector",
output_contract="QueryMatch[]",
diagnostics_namespace="query",
provenance_prefix="query.selector",
cli={"commands": ["mkt query", "mkt extract", "mkt cache query"]},
docs=["docs/query-extraction.md"],
),
query=_query_document_selector,
),
QueryEngine(
descriptor=ExtensionDescriptor(
id="jsonpath",
kind="query-engine",
summary="Optional JSONPath engine over Document.to_dict().",
capabilities=[ProcessingCapability(id="ast", kind="read")],
optional_dependencies=[
OptionalDependency(
name="jsonpath_ng",
package="jsonpath-ng",
extra="query",
required=True,
purpose="Evaluate JSONPath expressions.",
)
],
input_contract="Document + JSONPath expression",
output_contract="QueryMatch[]",
diagnostics_namespace="query.jsonpath",
provenance_prefix="query.jsonpath",
cli={"commands": ["mkt query --engine jsonpath", "mkt extract --engine jsonpath"]},
docs=["docs/query-extraction.md", "docs/local-index-backend.md"],
),
query=_query_document_jsonpath,
),
]
)

View File

@@ -0,0 +1,50 @@
from markitect_tool.extension import builtin_extension_registry
def test_builtin_extension_registry_lists_query_processors_and_backend():
registry = builtin_extension_registry()
ids = [descriptor.id for descriptor in registry.list()]
assert "query.selector" not in ids
assert "selector" in ids
assert "jsonpath" in ids
assert "processor.identity" in ids
assert "processor.uppercase" in ids
assert "processor.include" in ids
assert "backend.local-sqlite" in ids
def test_builtin_processor_descriptors_capture_safety_and_provenance():
registry = builtin_extension_registry()
include = registry.get("processor.include")
uppercase = registry.get("processor.uppercase")
assert include.kind == "processor"
assert include.safety["reads_files"] is True
assert include.provenance_prefix == "processor.include"
assert uppercase.safety == {}
assert uppercase.provenance_prefix == "processor.uppercase"
def test_builtin_local_sqlite_descriptor_exposes_backend_capabilities():
registry = builtin_extension_registry()
descriptor = registry.get("backend.local-sqlite")
assert descriptor.kind == "backend"
assert {capability.id for capability in descriptor.capabilities} >= {
"snapshots",
"ast",
"json",
"fts",
"sql",
"provenance",
}
assert descriptor.cli["commands"] == [
"mkt cache init",
"mkt cache index",
"mkt cache query",
"mkt search",
]

View File

@@ -0,0 +1,37 @@
from markitect_tool.cli.extensions import collect_cli_command_specs, command_specs_from_extension
from markitect_tool.extension import ExtensionDescriptor, builtin_extension_registry
def test_command_specs_from_extension_handles_string_and_list_forms():
one = ExtensionDescriptor(id="one", kind="test", cli={"commands": "mkt one"})
many = ExtensionDescriptor(id="many", kind="test", cli={"commands": ["mkt a", "mkt b"]})
assert [spec.command for spec in command_specs_from_extension(one)] == ["mkt one"]
assert [spec.command for spec in command_specs_from_extension(many)] == ["mkt a", "mkt b"]
def test_collect_cli_command_specs_from_builtin_registry():
specs = collect_cli_command_specs(builtin_extension_registry())
commands = {(spec.extension_id, spec.command) for spec in specs}
assert ("selector", "mkt query") in commands
assert ("processor.uppercase", "mkt process") in commands
assert ("backend.local-sqlite", "mkt cache index") in commands
assert ("backend.local-sqlite", "mkt search") in commands
def test_cli_command_spec_serializes_without_empty_fields():
spec = command_specs_from_extension(
ExtensionDescriptor(
id="query.selector",
kind="query-engine",
summary="Selector engine",
cli={"commands": ["mkt query"], "group": "query"},
)
)[0]
data = spec.to_dict()
assert data["command"] == "mkt query"
assert data["extension_id"] == "query.selector"
assert data["metadata"]["group"] == "query"

View File

@@ -0,0 +1,176 @@
from pathlib import Path
import builtins
from click.testing import CliRunner
import pytest
from markitect_tool.backend import (
LocalSnapshotStore,
capability_check,
load_backend_manifest,
load_backend_registry,
local_index_path_for,
)
from markitect_tool.cli import main
from markitect_tool.core import parse_markdown
from markitect_tool.processor import ProcessorContext, run_fenced_processors
from markitect_tool.query import (
default_query_engine_registry,
InvalidQueryError,
extract_document,
query_document,
query_document_jsonpath,
)
from markitect_tool.reference import ReferenceContext, resolve_reference
CHARACTERIZATION_DOC = """---
document_type: adr
status: accepted
---
# Decision Record
## Context
Authors need stable infrastructure seams.
## Decision
Use explicit registries and processing envelopes.
"""
def test_query_selector_and_extraction_characterization():
document = parse_markdown(CHARACTERIZATION_DOC)
registry = default_query_engine_registry()
section_matches = query_document(document, "sections[heading=Decision]")
extracted = extract_document(document, "frontmatter.status")
assert registry.get("selector").descriptor.kind == "query-engine"
assert len(section_matches) == 1
assert section_matches[0].kind == "section"
assert section_matches[0].path == "$.sections[2]"
assert section_matches[0].text.startswith("## Decision")
assert extracted == ["accepted"]
def test_jsonpath_missing_dependency_diagnostic_characterization(monkeypatch):
document = parse_markdown(CHARACTERIZATION_DOC)
real_import = builtins.__import__
def fake_import(name, *args, **kwargs):
if name.startswith("jsonpath_ng"):
raise ImportError("blocked")
return real_import(name, *args, **kwargs)
monkeypatch.setattr(builtins, "__import__", fake_import)
with pytest.raises(InvalidQueryError, match="optional `jsonpath-ng`"):
query_document_jsonpath(document, "$.headings[*].text")
def test_processor_registry_result_provenance_characterization():
markdown = """```mkt-uppercase {#shout}
hello
```
"""
run = run_fenced_processors(markdown, context=ProcessorContext())
assert run.valid
assert run.blocks[0].processor == "uppercase"
assert run.blocks[0].unit_id == "shout"
assert run.results[0].content == "HELLO\n"
assert run.results[0].provenance[0].operation == "processor.uppercase"
def test_unknown_processor_diagnostic_characterization():
markdown = """```mkt-missing {#x}
content
```
"""
run = run_fenced_processors(markdown, context=ProcessorContext())
assert not run.valid
diagnostic = run.results[0].diagnostics[0].to_dict()
assert diagnostic["severity"] == "error"
assert diagnostic["code"] == "processor.unknown"
assert "Unknown processor" in diagnostic["message"]
def test_backend_manifest_registry_characterization():
manifest = load_backend_manifest("examples/backends/local-sqlite-backend.md")
registry = load_backend_registry(["examples/backends"])
check = capability_check(manifest, ["snapshots", "fts", "provenance"])
assert manifest.id == "local-sqlite-cache"
assert registry.get("local-sqlite-cache").storage["engine"] == "sqlite"
assert check.compatible
def test_local_index_snapshot_query_search_characterization(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text(CHARACTERIZATION_DOC, encoding="utf-8")
store = LocalSnapshotStore(local_index_path_for(tmp_path))
build = store.build([tmp_path], root=tmp_path)
state = store.load_state()[0]
document = store.get_document("doc.md")
search_results = store.search("registries")
assert build.parsed == ["doc.md"]
assert state.path == "doc.md"
assert state.snapshot_id.startswith("snapshot:")
assert document["headings"][0]["text"] == "Decision Record"
assert search_results[0].path == "doc.md"
assert search_results[0].unit_kind in {"section", "block"}
def test_reference_resolution_characterization(tmp_path: Path):
context_file = tmp_path / "context.md"
target_file = tmp_path / "target.md"
context_file.write_text("# Context\n", encoding="utf-8")
target_file.write_text("# Target\n\n## Decision\n\nChosen text.\n", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
resolution = resolve_reference("target.md#decision", context=context)
assert resolution.target_path == str(target_file.resolve())
assert resolution.units[0].kind == "section"
assert resolution.units[0].unit_id == "decision"
assert "Chosen text" in resolution.units[0].text
def test_cli_output_envelopes_characterization(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text(CHARACTERIZATION_DOC, encoding="utf-8")
runner = CliRunner()
query = runner.invoke(
main,
["query", str(source), "sections[heading=Decision]", "--format", "json"],
)
index = runner.invoke(main, ["cache", "index", str(tmp_path), "--root", str(tmp_path)])
cache_query = runner.invoke(
main,
[
"cache",
"query",
"frontmatter.status",
"--root",
str(tmp_path),
"--format",
"json",
],
)
assert query.exit_code == 0
assert '"engine": "selector"' in query.output
assert '"count": 1' in query.output
assert index.exit_code == 0
assert "parsed: 1" in index.output
assert cache_query.exit_code == 0
assert '"source_path": "doc.md"' in cache_query.output

View File

@@ -0,0 +1,98 @@
from markitect_tool.extension import (
ExtensionDescriptor,
ExtensionExecutor,
ExtensionLifecycle,
ExtensionRegistry,
ExtensionRegistryError,
OptionalDependency,
ProcessingRequest,
ProcessingResult,
)
def test_extension_executor_runs_callbacks_in_order():
events: list[str] = []
def runner(request: ProcessingRequest) -> ProcessingResult:
events.append(f"run:{request.operation}")
return ProcessingResult(output={"ok": True})
lifecycle = ExtensionLifecycle()
lifecycle.on_before(lambda descriptor, request: events.append(f"before:{descriptor.id}"))
lifecycle.on_success(
lambda descriptor, request, result: events.append(f"success:{result.output['ok']}")
)
lifecycle.on_after(lambda descriptor, request, result: events.append("after"))
registry = ExtensionRegistry(
[ExtensionDescriptor(id="fake.runner", kind="test", factory=lambda: runner)]
)
result = ExtensionExecutor(registry, lifecycle=lifecycle).execute(
"fake.runner",
ProcessingRequest(operation="fake.run", input={}),
)
assert result.valid
assert result.trace[-1].event == "extension.executed"
assert events == ["before:fake.runner", "run:fake.run", "success:True", "after"]
def test_extension_executor_routes_failure_callbacks():
events: list[str] = []
def runner(request: ProcessingRequest) -> ProcessingResult:
return ProcessingResult.from_error(code="fake.error", message="Nope")
lifecycle = ExtensionLifecycle()
lifecycle.on_failure(lambda descriptor, request, result: events.append(result.diagnostics[0].code))
registry = ExtensionRegistry(
[ExtensionDescriptor(id="fake.runner", kind="test", factory=lambda: runner)]
)
result = ExtensionExecutor(registry, lifecycle=lifecycle).execute(
"fake.runner",
ProcessingRequest(operation="fake.run", input={}),
)
assert not result.valid
assert events == ["fake.error"]
def test_extension_executor_blocks_missing_required_dependency():
registry = ExtensionRegistry(
[
ExtensionDescriptor(
id="query.jsonpath",
kind="query-engine",
factory=lambda: lambda request: ProcessingResult(output=[]),
optional_dependencies=[
OptionalDependency(name="definitely_missing_markitect_dep", required=True)
],
)
]
)
result = ExtensionExecutor(registry).execute(
"query.jsonpath",
ProcessingRequest(operation="query.jsonpath", input={}),
)
assert not result.valid
assert result.diagnostics[0].code == "extension.missing_dependency"
assert "definitely_missing_markitect_dep" in result.diagnostics[0].details["missing"]
def test_extension_executor_rejects_non_result_return():
registry = ExtensionRegistry(
[ExtensionDescriptor(id="bad.runner", kind="test", factory=lambda: lambda request: {})]
)
try:
ExtensionExecutor(registry).execute(
"bad.runner",
ProcessingRequest(operation="bad.run", input={}),
)
except ExtensionRegistryError as exc:
assert "expected ProcessingResult" in str(exc)
else:
raise AssertionError("Expected ExtensionRegistryError")

View File

@@ -0,0 +1,75 @@
from pathlib import Path
from markitect_tool.extension import (
ProcessingCapability,
ProcessingContext,
ProcessingProvenance,
ProcessingRequest,
ProcessingResult,
ProcessingTrace,
)
def test_processing_request_serializes_context_and_cache_key():
request = ProcessingRequest(
operation="query.selector",
input={"selector": "sections[heading=Decision]"},
context=ProcessingContext(root=Path("/workspace"), caller="cli"),
options={"format": "json"},
capabilities=[ProcessingCapability(id="ast", description="Read parsed AST")],
)
data = request.to_dict()
assert data["operation"] == "query.selector"
assert data["context"]["root"] == "/workspace"
assert data["context"]["caller"] == "cli"
assert data["capabilities"][0]["id"] == "ast"
assert request.cache_key.startswith("processing:")
assert request.cache_key == ProcessingRequest(
operation="query.selector",
input={"selector": "sections[heading=Decision]"},
context=ProcessingContext(root=Path("/other")),
options={"format": "json"},
capabilities=[ProcessingCapability(id="ast", description="Read parsed AST")],
).cache_key
def test_processing_result_validity_provenance_and_trace():
result = ProcessingResult(
output={"count": 1},
provenance=[
ProcessingProvenance(
operation="query.selector",
source_path="doc.md",
content_hash="sha256:abc",
dependencies=["doc.md"],
)
],
trace=[ProcessingTrace(event="query.start", metadata={"engine": "selector"})],
)
data = result.to_dict()
assert result.valid
assert data["valid"] is True
assert data["output"]["count"] == 1
assert data["provenance"][0]["operation"] == "query.selector"
assert data["trace"][0]["metadata"]["engine"] == "selector"
def test_processing_result_from_error_normalizes_diagnostics():
result = ProcessingResult.from_error(
code="extension.missing_dependency",
message="Install optional dependency.",
source_path="doc.md",
line=3,
details={"dependency": "jsonpath-ng"},
)
data = result.to_dict()
assert not result.valid
assert data["diagnostics"][0]["severity"] == "error"
assert data["diagnostics"][0]["source"]["path"] == "doc.md"
assert data["diagnostics"][0]["details"]["dependency"] == "jsonpath-ng"

View File

@@ -0,0 +1,112 @@
import pytest
from markitect_tool.extension import (
ExtensionDescriptor,
ExtensionRegistry,
ExtensionRegistryError,
OptionalDependency,
ProcessingCapability,
)
def test_extension_descriptor_serializes_contract_metadata():
descriptor = ExtensionDescriptor(
id="query.selector",
kind="query-engine",
summary="Small selector query engine.",
capabilities=[ProcessingCapability(id="ast", kind="read")],
input_contract="Document + selector",
output_contract="QueryMatch[]",
diagnostics_namespace="query",
provenance_prefix="query.selector",
cli={"command": "mkt query"},
docs=["docs/query-extraction.md"],
)
data = descriptor.to_dict()
assert data["id"] == "query.selector"
assert data["kind"] == "query-engine"
assert data["capabilities"][0]["id"] == "ast"
assert data["cli"]["command"] == "mkt query"
def test_extension_registry_lists_by_kind_and_capability():
selector = ExtensionDescriptor(
id="query.selector",
kind="query-engine",
capabilities=[ProcessingCapability(id="ast")],
)
local = ExtensionDescriptor(
id="backend.local-sqlite",
kind="backend",
capabilities=[ProcessingCapability(id="snapshots"), ProcessingCapability(id="fts")],
)
registry = ExtensionRegistry([local, selector])
assert [descriptor.id for descriptor in registry.list()] == [
"backend.local-sqlite",
"query.selector",
]
assert [descriptor.id for descriptor in registry.list(kind="query-engine")] == [
"query.selector"
]
assert [descriptor.id for descriptor in registry.require_capability("fts")] == [
"backend.local-sqlite"
]
def test_extension_registry_rejects_duplicate_ids():
descriptor = ExtensionDescriptor(id="query.selector", kind="query-engine")
registry = ExtensionRegistry([descriptor])
with pytest.raises(ExtensionRegistryError, match="Duplicate extension id"):
registry.register(descriptor)
def test_extension_registry_checks_optional_dependencies():
registry = ExtensionRegistry(
[
ExtensionDescriptor(
id="query.jsonpath",
kind="query-engine",
optional_dependencies=[
OptionalDependency(
name="jsonpath_ng",
package="jsonpath-ng",
extra="query",
required=True,
),
OptionalDependency(name="tabulate"),
],
)
]
)
missing = registry.check_dependencies("query.jsonpath", available_modules=set())
available = registry.check_dependencies(
"query.jsonpath",
available_modules={"jsonpath_ng", "tabulate"},
)
assert not missing.compatible
assert missing.missing == ["jsonpath_ng"]
assert missing.optional_missing == ["tabulate"]
assert available.compatible
def test_extension_descriptor_instantiates_factory():
descriptor = ExtensionDescriptor(
id="fake.extension",
kind="test",
factory=lambda: {"ready": True},
)
assert descriptor.instantiate() == {"ready": True}
def test_extension_descriptor_requires_factory_to_instantiate():
descriptor = ExtensionDescriptor(id="fake.extension", kind="test")
with pytest.raises(ExtensionRegistryError, match="has no factory"):
descriptor.instantiate()

View File

@@ -0,0 +1,28 @@
from markitect_tool.core import parse_markdown
from markitect_tool.query import (
default_query_engine_registry,
extract_document_with_engine,
query_document_with_engine,
)
def test_default_query_engine_registry_exposes_builtin_descriptors():
registry = default_query_engine_registry()
descriptors = registry.extension_registry().to_dict()["extensions"]
assert [engine.descriptor.id for engine in registry.list()] == ["jsonpath", "selector"]
assert {descriptor["id"] for descriptor in descriptors} == {"selector", "jsonpath"}
assert registry.get("selector").descriptor.cli["commands"][0] == "mkt query"
assert registry.get("jsonpath").descriptor.optional_dependencies[0].name == "jsonpath_ng"
def test_query_document_with_engine_uses_selector_registry():
document = parse_markdown("# Doc\n\n## Decision\n\nChosen.\n")
matches = query_document_with_engine(document, "sections[heading=Decision]", engine="selector")
extracted = extract_document_with_engine(document, "sections[heading=Decision]", engine="selector")
assert matches[0].kind == "section"
assert matches[0].path == "$.sections[1]"
assert extracted == ["## Decision\n\nChosen."]

View File

@@ -3,7 +3,7 @@ id: MKTT-WP-0013
type: workplan
title: "Internal Extension Framework and Canonical Processing Model"
domain: markitect
status: todo
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: P1
@@ -81,7 +81,7 @@ discovery without forcing dynamic loading or external dependency installation.
```task
id: MKTT-WP-0013-T001
status: todo
status: done
priority: high
state_hub_task_id: "ba106001-c953-435a-8012-0dd83533d309"
```
@@ -101,11 +101,16 @@ Define the internal extension taxonomy:
Output: architecture note explaining extension boundaries, lifecycle,
registration semantics, and relationship to `MKTT-WP-0011`.
Implemented: `docs/internal-extension-framework.md` defines the internal
extension boundary, extension taxonomy, canonical lifecycle, descriptor shape,
processing model, registration strategy, compatibility rules, and
characterization coverage.
## P13.2 - Add characterization tests before refactor
```task
id: MKTT-WP-0013-T002
status: todo
status: done
priority: high
state_hub_task_id: "a270cb7a-4dbf-4562-b0ab-d5dda5124086"
```
@@ -124,11 +129,17 @@ Lock down current behavior before moving code behind registries:
Output: focused characterization tests that can fail loudly if refactoring
changes public behavior.
Implemented: `tests/test_extension_characterization.py` covers selector
query/extraction, JSONPath optional-dependency diagnostics, processor
provenance and diagnostics, backend manifest/capability behavior, local
snapshot/index/search behavior, content references, and representative CLI
output envelopes.
## P13.3 - Define canonical processing model
```task
id: MKTT-WP-0013-T003
status: todo
status: done
priority: high
state_hub_task_id: "8c88b9a7-1e8d-401c-ad09-8b5a19ccba14"
```
@@ -148,11 +159,17 @@ operations without making every extension depend on every subsystem.
Output: framework module, tests, and migration guide for current subsystems.
Implemented: `markitect_tool.extension.processing` defines
`ProcessingRequest`, `ProcessingContext`, `ProcessingResult`,
`ProcessingDiagnostic`, `ProcessingCapability`, `ProcessingProvenance`, and
`ProcessingTrace`, with serialization, cache-key, validity, provenance, trace,
and error normalization tests.
## P13.4 - Implement extension descriptors and registries
```task
id: MKTT-WP-0013-T004
status: todo
status: done
priority: high
state_hub_task_id: "3fb2fe81-9819-4679-99d0-ad60ac9e8277"
```
@@ -176,11 +193,17 @@ and, later, package entry points.
Output: descriptor schema, registry API, duplicate/missing dependency
diagnostics, and tests.
Implemented: `markitect_tool.extension.registry` defines
`ExtensionDescriptor`, `OptionalDependency`, `ExtensionRegistry`,
`ExtensionDependencyCheck`, and `ExtensionRegistryError`, with descriptor
serialization, kind/capability lookup, duplicate-id diagnostics, dependency
checks, and factory instantiation tests.
## P13.5 - Add callback hooks and execution lifecycle
```task
id: MKTT-WP-0013-T005
status: todo
status: done
priority: medium
state_hub_task_id: "be8f2056-f413-44f9-be9c-6046c34e307e"
```
@@ -200,11 +223,15 @@ hidden global behavior.
Output: callback model and tests with fake extensions.
Implemented: `ExtensionLifecycle` and `ExtensionExecutor` provide explicit
before/success/failure/after callbacks, dependency checks before execution,
result type normalization, execution trace emission, and fake-extension tests.
## P13.6 - Refactor query engines behind registry
```task
id: MKTT-WP-0013-T006
status: todo
status: done
priority: high
state_hub_task_id: "0226c1d1-f583-43ad-8e20-f75f9790e17d"
```
@@ -215,11 +242,16 @@ compatibility.
Output: registered selector/jsonpath engines, compatibility shims, and tests.
Implemented: selector and JSONPath engines now live behind
`QueryEngineRegistry` descriptors, with compatibility shims for
`query_document`, `extract_document`, `query_document_jsonpath`, and
`extract_document_jsonpath`; CLI behavior remains unchanged.
## P13.7 - Refactor processors and local backend as registered extensions
```task
id: MKTT-WP-0013-T007
status: todo
status: done
priority: medium
state_hub_task_id: "a966dcbb-3ae8-47bf-85c8-4ba6ddcf7a31"
```
@@ -237,11 +269,16 @@ Focus areas:
Output: extension-backed processor/backend registration and regression tests.
Implemented: `builtin_extension_registry()` now exposes built-in query engines,
deterministic processors, and the local SQLite backend as extension
descriptors with capabilities, safety flags, CLI affordances, docs/examples,
diagnostic namespaces, and provenance prefixes.
## P13.8 - Refactor CLI composition to reduce central wiring
```task
id: MKTT-WP-0013-T008
status: todo
status: done
priority: medium
state_hub_task_id: "3e88ca62-8dba-4632-b5d0-29827d102322"
```
@@ -253,11 +290,17 @@ point.
Output: CLI extension hook, migrated command group examples, and unchanged
public CLI behavior.
Implemented first integration point: `markitect_tool.cli.extensions` derives
`CliCommandSpec` declarations from extension descriptors. Built-in query,
processor, and backend descriptors now expose command affordances such as
`mkt query`, `mkt process`, `mkt cache index`, and `mkt search` without making
the CLI module the only source of command metadata.
## P13.9 - Document extension authoring conventions
```task
id: MKTT-WP-0013-T009
status: todo
status: done
priority: medium
state_hub_task_id: "848e2a5e-c32b-4a94-906b-dc6aced4c71b"
```
@@ -275,6 +318,11 @@ Document how a new internal extension should be structured:
Output: extension authoring guide and one small template/example extension.
Implemented: `docs/extension-authoring.md` documents extension layout,
descriptor template, optional dependency declarations, processing envelopes,
diagnostics, provenance, safety/policy metadata, CLI affordances, tests, and
the boundary with business-facing workflows.
## Exit Criteria
- Existing behavior is covered by characterization tests before refactoring.