generated from coulomb/repo-seed
251 lines
8.5 KiB
Markdown
251 lines
8.5 KiB
Markdown
# Internal Extension Framework
|
|
|
|
## Purpose
|
|
|
|
Markitect has reached the point where optional features are useful but are
|
|
starting to concentrate wiring in central modules. Query engines, processors,
|
|
backend stores, references, contract checks, templates, generation adapters, and
|
|
CLI commands all need some combination of registration, capability metadata,
|
|
diagnostics, provenance, and optional dependency handling.
|
|
|
|
The internal extension framework should make those seams explicit without
|
|
turning the project into a heavy external plugin platform.
|
|
|
|
## Boundary
|
|
|
|
This framework is about internal extensibility:
|
|
|
|
```text
|
|
feature descriptor -> registry -> processing request/context/result
|
|
-> diagnostics/provenance/capabilities
|
|
-> CLI/API/backend integration
|
|
```
|
|
|
|
It is not the same as `MKTT-WP-0011` dataflow workflows. Workflows organize
|
|
business-facing processing steps for a document pipeline. The extension
|
|
framework organizes how Markitect itself exposes and composes capabilities.
|
|
|
|
## Extension Taxonomy
|
|
|
|
| Kind | Examples | Primary Contract |
|
|
| --- | --- | --- |
|
|
| `query-engine` | selector, JSONPath | document/data in, matches out |
|
|
| `processor` | identity, uppercase, include | fenced block in, processed result out |
|
|
| `backend` | local SQLite index | snapshots/index/search storage |
|
|
| `reference-provider` | section, region, fence, line | address in, content units out |
|
|
| `validator` | schema, contract, section assertion | document/context in, diagnostics out |
|
|
| `runtime` | context loader, form state, dynamic rules | document/contract/context in, diagnostics and state out |
|
|
| `assessment-runner` | provider-neutral rubric execution | assessment request in, normalized result out |
|
|
| `policy-gateway` | local label gateway, future external auth adapters | subject/action/object in, decision or filtered results out |
|
|
| `template-engine` | deterministic templates | template/data in, Markdown out |
|
|
| `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out |
|
|
| `cli-group` | cache, backend, ref, class | command descriptors or registration hook |
|
|
| `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
|
|
| `document-function` | future function layer | function call in, typed document value out |
|
|
|
|
## Canonical Lifecycle
|
|
|
|
An extension should be describable before it runs:
|
|
|
|
1. Register descriptor.
|
|
2. Check optional dependencies.
|
|
3. Check capabilities and policy labels.
|
|
4. Build processing context.
|
|
5. Execute operation.
|
|
6. Normalize result, diagnostics, provenance, and trace data.
|
|
7. Expose output through library API, CLI, backend, or workflow layer.
|
|
|
|
The framework should allow deterministic extensions to stay simple. Assisted,
|
|
external, networked, or filesystem-mutating extensions should declare that
|
|
explicitly before execution.
|
|
|
|
## Descriptor Shape
|
|
|
|
The first descriptor model should cover:
|
|
|
|
- stable id
|
|
- kind
|
|
- version
|
|
- summary
|
|
- implementation reference
|
|
- capability declarations
|
|
- optional dependency declarations
|
|
- safety flags
|
|
- input and output contract names
|
|
- diagnostics namespace
|
|
- provenance operation prefix
|
|
- documentation and example links
|
|
- CLI affordances where applicable
|
|
|
|
Descriptors are not meant to replace implementation modules. They are the small
|
|
declarative surface that lets Markitect inspect, list, validate, and compose
|
|
capabilities consistently.
|
|
|
|
## Processing Model
|
|
|
|
The canonical processing model should define a small set of shared envelopes:
|
|
|
|
- `ProcessingRequest`: operation id, input payload, options, scope
|
|
- `ProcessingContext`: root, source path, namespaces, variables, policy, backend
|
|
handles, and caller metadata
|
|
- `ProcessingResult`: output payload, diagnostics, provenance, dependencies,
|
|
trace events, and validity
|
|
- `ProcessingDiagnostic`: severity, code, message, source, help
|
|
- `ProcessingCapability`: declared feature or permission requirement
|
|
- `ProcessingProvenance`: operation, source identity, snapshot/content hashes,
|
|
dependencies, backend/provider metadata
|
|
|
|
Subsystem-specific types may remain richer. The canonical model is the bridge,
|
|
not a forced replacement for every local dataclass.
|
|
|
|
### Processing Request Example
|
|
|
|
```python
|
|
from pathlib import Path
|
|
|
|
from markitect_tool.extension import ProcessingContext, ProcessingRequest
|
|
|
|
request = ProcessingRequest(
|
|
operation="query.selector",
|
|
input={"selector": "sections[heading=Decision]"},
|
|
context=ProcessingContext(
|
|
source_path=Path("docs/adr.md"),
|
|
namespaces={"std": "standards"},
|
|
variables={"audience": "internal"},
|
|
),
|
|
options={"format": "json"},
|
|
scope="document",
|
|
)
|
|
```
|
|
|
|
The request cache key includes operation, input, options, scope, declared
|
|
capabilities, metadata, and stable context semantics:
|
|
|
|
- source path
|
|
- namespaces
|
|
- variables
|
|
- policy
|
|
- metadata
|
|
|
|
It intentionally excludes root path, caller name, and live backend handles.
|
|
Those are execution-environment details. If they matter semantically for an
|
|
extension, put explicit values in request options or metadata.
|
|
|
|
### Processing Result Example
|
|
|
|
```python
|
|
from markitect_tool.extension import (
|
|
ProcessingProvenance,
|
|
ProcessingResult,
|
|
ProcessingTrace,
|
|
)
|
|
|
|
result = ProcessingResult(
|
|
output={"count": 1},
|
|
provenance=[
|
|
ProcessingProvenance(
|
|
operation="query.selector",
|
|
source_path="docs/adr.md",
|
|
content_hash="sha256:...",
|
|
)
|
|
],
|
|
).with_trace(ProcessingTrace(event="query.done"))
|
|
```
|
|
|
|
`ProcessingResult.valid` is derived from diagnostics. Any diagnostic with
|
|
severity `error` makes the result invalid.
|
|
|
|
## Registration Strategy
|
|
|
|
Start with in-package registration:
|
|
|
|
```text
|
|
markitect_tool/extensions/
|
|
query_selector.py
|
|
query_jsonpath.py
|
|
backend_local_sqlite.py
|
|
processors_builtin.py
|
|
```
|
|
|
|
Each module exposes one or more descriptors plus a registration function. The
|
|
root registry can be assembled explicitly at import time or by a small internal
|
|
discovery list. Package entry points can be added later if external extension
|
|
packages become a real requirement.
|
|
|
|
See `docs/extension-authoring.md` for the extension authoring checklist and
|
|
descriptor template.
|
|
|
|
### Registry Use
|
|
|
|
Extension registries are optimized for common lookup patterns:
|
|
|
|
- `registry.get("backend.local-sqlite")`
|
|
- `registry.list(kind="query-engine")`
|
|
- `registry.require_capability("fts")`
|
|
- `registry.check_dependencies("jsonpath")`
|
|
|
|
Kinds and capabilities are indexed at registration time, so large registries can
|
|
avoid repeated full scans for basic discovery.
|
|
|
|
The same registry is exposed on the CLI for practical discovery:
|
|
|
|
```bash
|
|
mkt extension list
|
|
mkt extension inspect memory.context-package
|
|
mkt extension commands
|
|
```
|
|
|
|
This makes the extension catalog part of the user-visible contract. When a
|
|
capability gains a CLI command, the descriptor should declare that command so
|
|
generated docs and audits can compare intent with the live surface.
|
|
|
|
### Execution Lifecycle
|
|
|
|
`ExtensionExecutor` wraps a descriptor factory with deterministic lifecycle
|
|
hooks:
|
|
|
|
1. Fetch descriptor.
|
|
2. Check required optional dependencies.
|
|
3. Instantiate callable implementation.
|
|
4. Run `before` callbacks.
|
|
5. Execute implementation.
|
|
6. Normalize result type.
|
|
7. Append `extension.executed` trace.
|
|
8. Run success or failure callbacks.
|
|
9. Run final `after` callbacks.
|
|
|
|
Callbacks are explicit. The framework does not introduce hidden global behavior.
|
|
|
|
## Compatibility Rules
|
|
|
|
The refactor must preserve:
|
|
|
|
- current library APIs such as `query_document`
|
|
- current CLI commands and output envelopes
|
|
- current diagnostic codes where users may rely on them
|
|
- current provenance operation strings unless intentionally deprecated
|
|
- optional dependency behavior for JSONPath and future adapters
|
|
- cache/index file compatibility unless a migration is documented
|
|
|
|
The first implementation adds canonical processing envelopes, extension
|
|
descriptors, registries, lifecycle callbacks, query-engine registry shims,
|
|
built-in extension descriptors, and CLI command specs while preserving existing
|
|
public commands.
|
|
|
|
## Characterization Coverage
|
|
|
|
Before refactoring, lock down:
|
|
|
|
- selector query and extraction
|
|
- optional JSONPath diagnostics
|
|
- processor registry behavior and provenance
|
|
- backend manifest registry and capability checks
|
|
- local SQLite snapshot/index/search behavior
|
|
- content reference resolution
|
|
- representative CLI command envelopes
|
|
- provenance and diagnostic shapes
|
|
|
|
These tests are deliberately a little redundant with unit tests. Their job is
|
|
to protect the current public behavior while internals move behind extension
|
|
descriptors and registries.
|