Extensible canonical internal processing refactoring

This commit is contained in:
2026-05-04 11:06:11 +02:00
parent 4a16ccf1e1
commit d977f9e67c
20 changed files with 1815 additions and 16 deletions

178
docs/extension-authoring.md Normal file
View File

@@ -0,0 +1,178 @@
# Internal Extension Authoring
## Purpose
This guide describes how to add a new internal Markitect extension without
turning central modules into the main integration surface.
Use this for internal query engines, processors, backend/index stores,
reference providers, validators, template/generation adapters, CLI command
groups, render/export adapters, and future document functions.
## Recommended Shape
Each extension should have:
- implementation module
- descriptor or descriptor factory
- focused tests
- characterization coverage if it changes existing behavior
- documentation or example link
- diagnostic namespace
- provenance operation prefix
- optional dependency declaration
- capability and safety declarations
Prefer this shape:
```text
src/markitect_tool/<area>/<feature>.py
tests/test_<area>_<feature>.py
docs/<feature>.md
```
If the extension is cross-cutting, register it from
`markitect_tool.extension.builtins` or a future internal discovery module rather
than importing it from many central files.
## Descriptor Template
```python
from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability
def my_extension_descriptor() -> ExtensionDescriptor:
return ExtensionDescriptor(
id="query.example",
kind="query-engine",
summary="Example query engine.",
capabilities=[
ProcessingCapability(id="ast", kind="read"),
],
input_contract="Document + example expression",
output_contract="QueryMatch[]",
diagnostics_namespace="query.example",
provenance_prefix="query.example",
cli={"commands": ["mkt query --engine example"]},
docs=["docs/example-query.md"],
examples=["examples/query/example.md"],
)
```
## Optional Dependencies
Declare optional dependencies in descriptors:
```python
from markitect_tool.extension import OptionalDependency
OptionalDependency(
name="jsonpath_ng",
package="jsonpath-ng",
extra="query",
required=True,
purpose="Evaluate JSONPath expressions.",
)
```
If a dependency is missing, return a structured diagnostic. Do not fail with an
unexplained import error.
## Processing Envelopes
Use canonical processing envelopes where an extension needs a shared execution
boundary:
- `ProcessingRequest`
- `ProcessingContext`
- `ProcessingResult`
- `ProcessingCapability`
- `ProcessingProvenance`
- `ProcessingTrace`
Subsystem-specific dataclasses may remain richer. The canonical model is the
bridge that lets callbacks, registries, diagnostics, provenance, and future
policy checks interact consistently.
## Diagnostics
Diagnostics should be:
- stable enough for tests and callers
- namespaced by subsystem or extension
- explicit about optional dependency failures
- tied to source locations where possible
- emitted as `Diagnostic` or `ProcessingResult.from_error`
Recommended code style:
```text
<extension-kind>.<condition>
query.invalid_jsonpath
processor.unknown
extension.missing_dependency
backend.local_sqlite.invalid_fts_query
```
## Provenance
Every extension that transforms, queries, reads, writes, generates, or indexes
content should expose provenance. Use a stable operation prefix:
```text
query.selector
query.jsonpath
processor.include
local_snapshot_store.put_file
```
Include source path, content hash, snapshot id, backend/provider id, and
dependencies when known.
## Safety And Policy
Descriptors should declare safety-relevant behavior:
- reads files
- writes local cache
- writes user output files
- accesses network
- invokes external process
- calls assisted-generation provider
- transmits content outside the local process
The initial framework records this metadata. Later policy layers can enforce it.
## CLI Affordances
If an extension exposes CLI behavior, declare it in `descriptor.cli`:
```python
cli={"commands": ["mkt cache index", "mkt search"]}
```
`markitect_tool.cli.extensions.collect_cli_command_specs()` can inspect these
affordances without importing Click command implementations.
## Testing Checklist
Add tests for:
- descriptor serialization
- registry lookup and duplicate handling
- missing optional dependency diagnostics
- canonical result validity
- provenance shape
- CLI output envelope if public commands are exposed
- compatibility shim if replacing an existing API
When refactoring an existing feature, add characterization tests first, then
migrate implementation behind descriptors or registries.
## Boundary With Workflows
Internal extensions describe what Markitect can do. Workflows describe how a
user combines capabilities for a concrete document pipeline.
An extension may expose a workflow step later, but it should not depend on the
workflow engine to be useful from the library or CLI.

View File

@@ -0,0 +1,149 @@
# Internal Extension Framework
## Purpose
Markitect has reached the point where optional features are useful but are
starting to concentrate wiring in central modules. Query engines, processors,
backend stores, references, contract checks, templates, generation adapters, and
CLI commands all need some combination of registration, capability metadata,
diagnostics, provenance, and optional dependency handling.
The internal extension framework should make those seams explicit without
turning the project into a heavy external plugin platform.
## Boundary
This framework is about internal extensibility:
```text
feature descriptor -> registry -> processing request/context/result
-> diagnostics/provenance/capabilities
-> CLI/API/backend integration
```
It is not the same as `MKTT-WP-0011` dataflow workflows. Workflows organize
business-facing processing steps for a document pipeline. The extension
framework organizes how Markitect itself exposes and composes capabilities.
## Extension Taxonomy
| Kind | Examples | Primary Contract |
| --- | --- | --- |
| `query-engine` | selector, JSONPath | document/data in, matches out |
| `processor` | identity, uppercase, include | fenced block in, processed result out |
| `backend` | local SQLite index | snapshots/index/search storage |
| `reference-provider` | section, region, fence, line | address in, content units out |
| `validator` | schema, contract, section assertion | document/context in, diagnostics out |
| `template-engine` | deterministic templates | template/data in, Markdown out |
| `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out |
| `cli-group` | cache, backend, ref, class | command descriptors or registration hook |
| `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
| `document-function` | future function layer | function call in, typed document value out |
## Canonical Lifecycle
An extension should be describable before it runs:
1. Register descriptor.
2. Check optional dependencies.
3. Check capabilities and policy labels.
4. Build processing context.
5. Execute operation.
6. Normalize result, diagnostics, provenance, and trace data.
7. Expose output through library API, CLI, backend, or workflow layer.
The framework should allow deterministic extensions to stay simple. Assisted,
external, networked, or filesystem-mutating extensions should declare that
explicitly before execution.
## Descriptor Shape
The first descriptor model should cover:
- stable id
- kind
- version
- summary
- implementation reference
- capability declarations
- optional dependency declarations
- safety flags
- input and output contract names
- diagnostics namespace
- provenance operation prefix
- documentation and example links
- CLI affordances where applicable
Descriptors are not meant to replace implementation modules. They are the small
declarative surface that lets Markitect inspect, list, validate, and compose
capabilities consistently.
## Processing Model
The canonical processing model should define a small set of shared envelopes:
- `ProcessingRequest`: operation id, input payload, options, scope
- `ProcessingContext`: root, source path, namespaces, variables, policy, backend
handles, and caller metadata
- `ProcessingResult`: output payload, diagnostics, provenance, dependencies,
trace events, and validity
- `ProcessingDiagnostic`: severity, code, message, source, help
- `ProcessingCapability`: declared feature or permission requirement
- `ProcessingProvenance`: operation, source identity, snapshot/content hashes,
dependencies, backend/provider metadata
Subsystem-specific types may remain richer. The canonical model is the bridge,
not a forced replacement for every local dataclass.
## Registration Strategy
Start with in-package registration:
```text
markitect_tool/extensions/
query_selector.py
query_jsonpath.py
backend_local_sqlite.py
processors_builtin.py
```
Each module exposes one or more descriptors plus a registration function. The
root registry can be assembled explicitly at import time or by a small internal
discovery list. Package entry points can be added later if external extension
packages become a real requirement.
See `docs/extension-authoring.md` for the extension authoring checklist and
descriptor template.
## Compatibility Rules
The refactor must preserve:
- current library APIs such as `query_document`
- current CLI commands and output envelopes
- current diagnostic codes where users may rely on them
- current provenance operation strings unless intentionally deprecated
- optional dependency behavior for JSONPath and future adapters
- cache/index file compatibility unless a migration is documented
The first implementation adds canonical processing envelopes, extension
descriptors, registries, lifecycle callbacks, query-engine registry shims,
built-in extension descriptors, and CLI command specs while preserving existing
public commands.
## Characterization Coverage
Before refactoring, lock down:
- selector query and extraction
- optional JSONPath diagnostics
- processor registry behavior and provenance
- backend manifest registry and capability checks
- local SQLite snapshot/index/search behavior
- content reference resolution
- representative CLI command envelopes
- provenance and diagnostic shapes
These tests are deliberately a little redundant with unit tests. Their job is
to protect the current public behavior while internals move behind extension
descriptors and registries.

View File

@@ -34,7 +34,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
| `MKTT-WP-0007` | complete | done | `MKTT-WP-0006` | Advanced query and local index backend is complete: AST inspection, optional JSONPath, SQLite snapshots/metadata, FTS5 search, incremental refresh, and local index CLI. |
| `MKTT-WP-0013` | P1 | todo | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework and canonical processing model: characterize current behavior, add registries/descriptors/callbacks, and reduce central wiring before heavier runtime/workflow work. |
| `MKTT-WP-0013` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework is complete: characterization tests, canonical processing model, descriptors, registries, lifecycle callbacks, query-engine registry, built-in extension catalog, CLI command specs, and authoring guide. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |