generated from coulomb/repo-seed
Extensible canonical internal processing refactoring
This commit is contained in:
178
docs/extension-authoring.md
Normal file
178
docs/extension-authoring.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Internal Extension Authoring
|
||||
|
||||
## Purpose
|
||||
|
||||
This guide describes how to add a new internal Markitect extension without
|
||||
turning central modules into the main integration surface.
|
||||
|
||||
Use this for internal query engines, processors, backend/index stores,
|
||||
reference providers, validators, template/generation adapters, CLI command
|
||||
groups, render/export adapters, and future document functions.
|
||||
|
||||
## Recommended Shape
|
||||
|
||||
Each extension should have:
|
||||
|
||||
- implementation module
|
||||
- descriptor or descriptor factory
|
||||
- focused tests
|
||||
- characterization coverage if it changes existing behavior
|
||||
- documentation or example link
|
||||
- diagnostic namespace
|
||||
- provenance operation prefix
|
||||
- optional dependency declaration
|
||||
- capability and safety declarations
|
||||
|
||||
Prefer this shape:
|
||||
|
||||
```text
|
||||
src/markitect_tool/<area>/<feature>.py
|
||||
tests/test_<area>_<feature>.py
|
||||
docs/<feature>.md
|
||||
```
|
||||
|
||||
If the extension is cross-cutting, register it from
|
||||
`markitect_tool.extension.builtins` or a future internal discovery module rather
|
||||
than importing it from many central files.
|
||||
|
||||
## Descriptor Template
|
||||
|
||||
```python
|
||||
from markitect_tool.extension import ExtensionDescriptor, ProcessingCapability
|
||||
|
||||
|
||||
def my_extension_descriptor() -> ExtensionDescriptor:
|
||||
return ExtensionDescriptor(
|
||||
id="query.example",
|
||||
kind="query-engine",
|
||||
summary="Example query engine.",
|
||||
capabilities=[
|
||||
ProcessingCapability(id="ast", kind="read"),
|
||||
],
|
||||
input_contract="Document + example expression",
|
||||
output_contract="QueryMatch[]",
|
||||
diagnostics_namespace="query.example",
|
||||
provenance_prefix="query.example",
|
||||
cli={"commands": ["mkt query --engine example"]},
|
||||
docs=["docs/example-query.md"],
|
||||
examples=["examples/query/example.md"],
|
||||
)
|
||||
```
|
||||
|
||||
## Optional Dependencies
|
||||
|
||||
Declare optional dependencies in descriptors:
|
||||
|
||||
```python
|
||||
from markitect_tool.extension import OptionalDependency
|
||||
|
||||
OptionalDependency(
|
||||
name="jsonpath_ng",
|
||||
package="jsonpath-ng",
|
||||
extra="query",
|
||||
required=True,
|
||||
purpose="Evaluate JSONPath expressions.",
|
||||
)
|
||||
```
|
||||
|
||||
If a dependency is missing, return a structured diagnostic. Do not fail with an
|
||||
unexplained import error.
|
||||
|
||||
## Processing Envelopes
|
||||
|
||||
Use canonical processing envelopes where an extension needs a shared execution
|
||||
boundary:
|
||||
|
||||
- `ProcessingRequest`
|
||||
- `ProcessingContext`
|
||||
- `ProcessingResult`
|
||||
- `ProcessingCapability`
|
||||
- `ProcessingProvenance`
|
||||
- `ProcessingTrace`
|
||||
|
||||
Subsystem-specific dataclasses may remain richer. The canonical model is the
|
||||
bridge that lets callbacks, registries, diagnostics, provenance, and future
|
||||
policy checks interact consistently.
|
||||
|
||||
## Diagnostics
|
||||
|
||||
Diagnostics should be:
|
||||
|
||||
- stable enough for tests and callers
|
||||
- namespaced by subsystem or extension
|
||||
- explicit about optional dependency failures
|
||||
- tied to source locations where possible
|
||||
- emitted as `Diagnostic` or `ProcessingResult.from_error`
|
||||
|
||||
Recommended code style:
|
||||
|
||||
```text
|
||||
<extension-kind>.<condition>
|
||||
query.invalid_jsonpath
|
||||
processor.unknown
|
||||
extension.missing_dependency
|
||||
backend.local_sqlite.invalid_fts_query
|
||||
```
|
||||
|
||||
## Provenance
|
||||
|
||||
Every extension that transforms, queries, reads, writes, generates, or indexes
|
||||
content should expose provenance. Use a stable operation prefix:
|
||||
|
||||
```text
|
||||
query.selector
|
||||
query.jsonpath
|
||||
processor.include
|
||||
local_snapshot_store.put_file
|
||||
```
|
||||
|
||||
Include source path, content hash, snapshot id, backend/provider id, and
|
||||
dependencies when known.
|
||||
|
||||
## Safety And Policy
|
||||
|
||||
Descriptors should declare safety-relevant behavior:
|
||||
|
||||
- reads files
|
||||
- writes local cache
|
||||
- writes user output files
|
||||
- accesses network
|
||||
- invokes external process
|
||||
- calls assisted-generation provider
|
||||
- transmits content outside the local process
|
||||
|
||||
The initial framework records this metadata. Later policy layers can enforce it.
|
||||
|
||||
## CLI Affordances
|
||||
|
||||
If an extension exposes CLI behavior, declare it in `descriptor.cli`:
|
||||
|
||||
```python
|
||||
cli={"commands": ["mkt cache index", "mkt search"]}
|
||||
```
|
||||
|
||||
`markitect_tool.cli.extensions.collect_cli_command_specs()` can inspect these
|
||||
affordances without importing Click command implementations.
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
Add tests for:
|
||||
|
||||
- descriptor serialization
|
||||
- registry lookup and duplicate handling
|
||||
- missing optional dependency diagnostics
|
||||
- canonical result validity
|
||||
- provenance shape
|
||||
- CLI output envelope if public commands are exposed
|
||||
- compatibility shim if replacing an existing API
|
||||
|
||||
When refactoring an existing feature, add characterization tests first, then
|
||||
migrate implementation behind descriptors or registries.
|
||||
|
||||
## Boundary With Workflows
|
||||
|
||||
Internal extensions describe what Markitect can do. Workflows describe how a
|
||||
user combines capabilities for a concrete document pipeline.
|
||||
|
||||
An extension may expose a workflow step later, but it should not depend on the
|
||||
workflow engine to be useful from the library or CLI.
|
||||
149
docs/internal-extension-framework.md
Normal file
149
docs/internal-extension-framework.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Internal Extension Framework
|
||||
|
||||
## Purpose
|
||||
|
||||
Markitect has reached the point where optional features are useful but are
|
||||
starting to concentrate wiring in central modules. Query engines, processors,
|
||||
backend stores, references, contract checks, templates, generation adapters, and
|
||||
CLI commands all need some combination of registration, capability metadata,
|
||||
diagnostics, provenance, and optional dependency handling.
|
||||
|
||||
The internal extension framework should make those seams explicit without
|
||||
turning the project into a heavy external plugin platform.
|
||||
|
||||
## Boundary
|
||||
|
||||
This framework is about internal extensibility:
|
||||
|
||||
```text
|
||||
feature descriptor -> registry -> processing request/context/result
|
||||
-> diagnostics/provenance/capabilities
|
||||
-> CLI/API/backend integration
|
||||
```
|
||||
|
||||
It is not the same as `MKTT-WP-0011` dataflow workflows. Workflows organize
|
||||
business-facing processing steps for a document pipeline. The extension
|
||||
framework organizes how Markitect itself exposes and composes capabilities.
|
||||
|
||||
## Extension Taxonomy
|
||||
|
||||
| Kind | Examples | Primary Contract |
|
||||
| --- | --- | --- |
|
||||
| `query-engine` | selector, JSONPath | document/data in, matches out |
|
||||
| `processor` | identity, uppercase, include | fenced block in, processed result out |
|
||||
| `backend` | local SQLite index | snapshots/index/search storage |
|
||||
| `reference-provider` | section, region, fence, line | address in, content units out |
|
||||
| `validator` | schema, contract, section assertion | document/context in, diagnostics out |
|
||||
| `template-engine` | deterministic templates | template/data in, Markdown out |
|
||||
| `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out |
|
||||
| `cli-group` | cache, backend, ref, class | command descriptors or registration hook |
|
||||
| `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
|
||||
| `document-function` | future function layer | function call in, typed document value out |
|
||||
|
||||
## Canonical Lifecycle
|
||||
|
||||
An extension should be describable before it runs:
|
||||
|
||||
1. Register descriptor.
|
||||
2. Check optional dependencies.
|
||||
3. Check capabilities and policy labels.
|
||||
4. Build processing context.
|
||||
5. Execute operation.
|
||||
6. Normalize result, diagnostics, provenance, and trace data.
|
||||
7. Expose output through library API, CLI, backend, or workflow layer.
|
||||
|
||||
The framework should allow deterministic extensions to stay simple. Assisted,
|
||||
external, networked, or filesystem-mutating extensions should declare that
|
||||
explicitly before execution.
|
||||
|
||||
## Descriptor Shape
|
||||
|
||||
The first descriptor model should cover:
|
||||
|
||||
- stable id
|
||||
- kind
|
||||
- version
|
||||
- summary
|
||||
- implementation reference
|
||||
- capability declarations
|
||||
- optional dependency declarations
|
||||
- safety flags
|
||||
- input and output contract names
|
||||
- diagnostics namespace
|
||||
- provenance operation prefix
|
||||
- documentation and example links
|
||||
- CLI affordances where applicable
|
||||
|
||||
Descriptors are not meant to replace implementation modules. They are the small
|
||||
declarative surface that lets Markitect inspect, list, validate, and compose
|
||||
capabilities consistently.
|
||||
|
||||
## Processing Model
|
||||
|
||||
The canonical processing model should define a small set of shared envelopes:
|
||||
|
||||
- `ProcessingRequest`: operation id, input payload, options, scope
|
||||
- `ProcessingContext`: root, source path, namespaces, variables, policy, backend
|
||||
handles, and caller metadata
|
||||
- `ProcessingResult`: output payload, diagnostics, provenance, dependencies,
|
||||
trace events, and validity
|
||||
- `ProcessingDiagnostic`: severity, code, message, source, help
|
||||
- `ProcessingCapability`: declared feature or permission requirement
|
||||
- `ProcessingProvenance`: operation, source identity, snapshot/content hashes,
|
||||
dependencies, backend/provider metadata
|
||||
|
||||
Subsystem-specific types may remain richer. The canonical model is the bridge,
|
||||
not a forced replacement for every local dataclass.
|
||||
|
||||
## Registration Strategy
|
||||
|
||||
Start with in-package registration:
|
||||
|
||||
```text
|
||||
markitect_tool/extensions/
|
||||
query_selector.py
|
||||
query_jsonpath.py
|
||||
backend_local_sqlite.py
|
||||
processors_builtin.py
|
||||
```
|
||||
|
||||
Each module exposes one or more descriptors plus a registration function. The
|
||||
root registry can be assembled explicitly at import time or by a small internal
|
||||
discovery list. Package entry points can be added later if external extension
|
||||
packages become a real requirement.
|
||||
|
||||
See `docs/extension-authoring.md` for the extension authoring checklist and
|
||||
descriptor template.
|
||||
|
||||
## Compatibility Rules
|
||||
|
||||
The refactor must preserve:
|
||||
|
||||
- current library APIs such as `query_document`
|
||||
- current CLI commands and output envelopes
|
||||
- current diagnostic codes where users may rely on them
|
||||
- current provenance operation strings unless intentionally deprecated
|
||||
- optional dependency behavior for JSONPath and future adapters
|
||||
- cache/index file compatibility unless a migration is documented
|
||||
|
||||
The first implementation adds canonical processing envelopes, extension
|
||||
descriptors, registries, lifecycle callbacks, query-engine registry shims,
|
||||
built-in extension descriptors, and CLI command specs while preserving existing
|
||||
public commands.
|
||||
|
||||
## Characterization Coverage
|
||||
|
||||
Before refactoring, lock down:
|
||||
|
||||
- selector query and extraction
|
||||
- optional JSONPath diagnostics
|
||||
- processor registry behavior and provenance
|
||||
- backend manifest registry and capability checks
|
||||
- local SQLite snapshot/index/search behavior
|
||||
- content reference resolution
|
||||
- representative CLI command envelopes
|
||||
- provenance and diagnostic shapes
|
||||
|
||||
These tests are deliberately a little redundant with unit tests. Their job is
|
||||
to protect the current public behavior while internals move behind extension
|
||||
descriptors and registries.
|
||||
@@ -34,7 +34,7 @@ and descriptions mirror the operational view.
|
||||
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
|
||||
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
|
||||
| `MKTT-WP-0007` | complete | done | `MKTT-WP-0006` | Advanced query and local index backend is complete: AST inspection, optional JSONPath, SQLite snapshots/metadata, FTS5 search, incremental refresh, and local index CLI. |
|
||||
| `MKTT-WP-0013` | P1 | todo | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework and canonical processing model: characterize current behavior, add registries/descriptors/callbacks, and reduce central wiring before heavier runtime/workflow work. |
|
||||
| `MKTT-WP-0013` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0004`, `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0010` | Internal extension framework is complete: characterization tests, canonical processing model, descriptors, registries, lifecycle callbacks, query-engine registry, built-in extension catalog, CLI command specs, and authoring guide. |
|
||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
||||
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
||||
|
||||
Reference in New Issue
Block a user