Files

tegwick d977f9e67c Extensible canonical internal processing refactoring

2026-05-04 11:06:11 +02:00

11 KiB

Raw Blame History

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, related_workplans, created, updated, state_hub_workstream_id

type

title

domain

status

owner

topic_slug

planning_priority

planning_order

depends_on_workplans

related_workplans

created

updated

state_hub_workstream_id

MKTT-WP-0013

workplan

Internal Extension Framework and Canonical Processing Model

markitect

done

markitect-tool

markitect

MKTT-WP-0003

MKTT-WP-0004

MKTT-WP-0006

MKTT-WP-0007

MKTT-WP-0010

MKTT-WP-0005

MKTT-WP-0009

MKTT-WP-0011

MKTT-WP-0012

2026-05-04

5eea103f-f584-4360-b7e3-c5b09a4814bd

MKTT-WP-0013: Internal Extension Framework and Canonical Processing Model

Purpose

Create an internal extension framework that lets optional Markitect features register well-contained implementations, descriptors, callbacks, diagnostics, capabilities, and CLI/query integration points without repeatedly expanding central modules.

This workplan is about internal extensibility and framework shape. It is distinct from MKTT-WP-0011, which organizes business-facing dataflow pipelines.

Background

Recent implementation work added valuable optional functionality:

processor registry and deterministic fenced-block processors
backend manifests and local SQLite backend
selector and optional JSONPath query engines
FTS search over indexed sections and blocks
content references, literate workflows, explode/implode, and content classes

The functionality is working, but extension pressure is visible. Optional features still tend to require edits in central files such as CLI wiring, query exports, backend exports, and shared command dispatch. That is acceptable early in a small toolkit, but it becomes a maintenance liability if Markitect is meant to grow into a research lab for sophisticated Markdown/knowledge systems.

The target architecture should preserve the current slim core while making extensions feel first-class:

specification file + implementation module + registration descriptor
  -> extension registry
  -> canonical processing request/context/result
  -> callbacks, diagnostics, provenance, capabilities
  -> CLI/API/query/backend integration

Decision

Yes, restructure, but do it deliberately:

Add characterization tests for the current behaviors before refactoring.
Define a canonical processing model that extensions can share.
Introduce extension descriptors and registries with minimal central wiring.
Migrate one vertical slice at a time.
Keep compatibility aliases and existing CLI commands stable.

Avoid a plugin system that is more elaborate than the project needs. The first version should support internal extension isolation and later package-level discovery without forcing dynamic loading or external dependency installation.

P13.1 - Architecture note and extension taxonomy

id: MKTT-WP-0013-T001
status: done
priority: high
state_hub_task_id: "ba106001-c953-435a-8012-0dd83533d309"

Define the internal extension taxonomy:

query engines
processors
backends and index stores
references and content-unit providers
validators and contract checks
templates/generation adapters
CLI command groups
future render/export adapters
future document functions

Output: architecture note explaining extension boundaries, lifecycle, registration semantics, and relationship to MKTT-WP-0011.

Implemented: docs/internal-extension-framework.md defines the internal extension boundary, extension taxonomy, canonical lifecycle, descriptor shape, processing model, registration strategy, compatibility rules, and characterization coverage.

P13.2 - Add characterization tests before refactor

id: MKTT-WP-0013-T002
status: done
priority: high
state_hub_task_id: "a270cb7a-4dbf-4562-b0ab-d5dda5124086"

Lock down current behavior before moving code behind registries:

selector query and extraction
optional JSONPath diagnostics
processor registry behavior
backend manifest registry
local SQLite snapshot/index/search behavior
content reference resolution
key CLI commands and output envelopes
provenance and diagnostics shapes

Output: focused characterization tests that can fail loudly if refactoring changes public behavior.

Implemented: tests/test_extension_characterization.py covers selector query/extraction, JSONPath optional-dependency diagnostics, processor provenance and diagnostics, backend manifest/capability behavior, local snapshot/index/search behavior, content references, and representative CLI output envelopes.

P13.3 - Define canonical processing model

id: MKTT-WP-0013-T003
status: done
priority: high
state_hub_task_id: "8c88b9a7-1e8d-401c-ad09-8b5a19ccba14"

Create shared framework types for extension execution:

ProcessingRequest
ProcessingContext
ProcessingResult
ProcessingDiagnostic
ProcessingCapability
ProcessingProvenance
optional ProcessingTrace

The model should support deterministic, assisted, external, and read-only operations without making every extension depend on every subsystem.

Output: framework module, tests, and migration guide for current subsystems.

Implemented: markitect_tool.extension.processing defines ProcessingRequest, ProcessingContext, ProcessingResult, ProcessingDiagnostic, ProcessingCapability, ProcessingProvenance, and ProcessingTrace, with serialization, cache-key, validity, provenance, trace, and error normalization tests.

P13.4 - Implement extension descriptors and registries

id: MKTT-WP-0013-T004
status: done
priority: high
state_hub_task_id: "3fb2fe81-9819-4679-99d0-ad60ac9e8277"

Define descriptor objects for extensions:

stable id
kind
version
implementation reference
capabilities
optional dependencies
safety/policy flags
input and output contracts
CLI/API affordances
docs/examples links

Implement registries that can be assembled from in-package extension modules and, later, package entry points.

Output: descriptor schema, registry API, duplicate/missing dependency diagnostics, and tests.

Implemented: markitect_tool.extension.registry defines ExtensionDescriptor, OptionalDependency, ExtensionRegistry, ExtensionDependencyCheck, and ExtensionRegistryError, with descriptor serialization, kind/capability lookup, duplicate-id diagnostics, dependency checks, and factory instantiation tests.

P13.5 - Add callback hooks and execution lifecycle

id: MKTT-WP-0013-T005
status: done
priority: medium
state_hub_task_id: "be8f2056-f413-44f9-be9c-6046c34e307e"

Add lifecycle callbacks for:

before execution
after success
after diagnostic failure
provenance capture
cache key calculation
capability/policy checks
trace/event emission

Callbacks must be explicit and deterministic by default. They should not become hidden global behavior.

Output: callback model and tests with fake extensions.

Implemented: ExtensionLifecycle and ExtensionExecutor provide explicit before/success/failure/after callbacks, dependency checks before execution, result type normalization, execution trace emission, and fake-extension tests.

P13.6 - Refactor query engines behind registry

id: MKTT-WP-0013-T006
status: done
priority: high
state_hub_task_id: "0226c1d1-f583-43ad-8e20-f75f9790e17d"

Move selector and JSONPath engines behind a query-engine registry while preserving query_document, extract_document, mkt query, and mkt extract compatibility.

Output: registered selector/jsonpath engines, compatibility shims, and tests.

Implemented: selector and JSONPath engines now live behind QueryEngineRegistry descriptors, with compatibility shims for query_document, extract_document, query_document_jsonpath, and extract_document_jsonpath; CLI behavior remains unchanged.

P13.7 - Refactor processors and local backend as registered extensions

id: MKTT-WP-0013-T007
status: done
priority: medium
state_hub_task_id: "a966dcbb-3ae8-47bf-85c8-4ba6ddcf7a31"

Adapt existing processor and backend infrastructure to expose descriptors and registry entries without changing their external behavior.

Focus areas:

deterministic fenced processors
local SQLite index backend
backend manifests
FTS search
snapshot refresh planning

Output: extension-backed processor/backend registration and regression tests.

Implemented: builtin_extension_registry() now exposes built-in query engines, deterministic processors, and the local SQLite backend as extension descriptors with capabilities, safety flags, CLI affordances, docs/examples, diagnostic namespaces, and provenance prefixes.

P13.8 - Refactor CLI composition to reduce central wiring

id: MKTT-WP-0013-T008
status: done
priority: medium
state_hub_task_id: "3e88ca62-8dba-4632-b5d0-29827d102322"

Reduce direct growth pressure in cli/main.py by allowing extension modules to register command groups or command specs through a small, testable integration point.

Output: CLI extension hook, migrated command group examples, and unchanged public CLI behavior.

Implemented first integration point: markitect_tool.cli.extensions derives CliCommandSpec declarations from extension descriptors. Built-in query, processor, and backend descriptors now expose command affordances such as mkt query, mkt process, mkt cache index, and mkt search without making the CLI module the only source of command metadata.

P13.9 - Document extension authoring conventions

id: MKTT-WP-0013-T009
status: done
priority: medium
state_hub_task_id: "848e2a5e-c32b-4a94-906b-dc6aced4c71b"

Document how a new internal extension should be structured:

specification file
implementation module
registration descriptor
tests
docs/examples
diagnostics and provenance expectations
optional dependency handling
policy/capability declarations

Output: extension authoring guide and one small template/example extension.

Implemented: docs/extension-authoring.md documents extension layout, descriptor template, optional dependency declarations, processing envelopes, diagnostics, provenance, safety/policy metadata, CLI affordances, tests, and the boundary with business-facing workflows.

Exit Criteria

Existing behavior is covered by characterization tests before refactoring.
Optional features can live in well-contained modules with descriptors.
Central CLI/query/backend files stop being the primary integration surface for every new feature.
The canonical processing model provides shared context/result/diagnostic/ provenance semantics without overfitting to pipelines.
The framework is clearly distinct from business-facing workflow orchestration.
Existing public commands and library APIs remain compatible or have explicit compatibility shims.

11 KiB Raw Blame History