extension for ref resolve, explode, implode, weave, tangle

This commit is contained in:
2026-05-04 02:25:49 +02:00
parent 8203f50fd5
commit 65bfc1aebf
39 changed files with 3959 additions and 25 deletions

79
docs/content-classes.md Normal file
View File

@@ -0,0 +1,79 @@
# Content Classes
Date: 2026-05-04
## Purpose
Content classes are data-defined composition rules for reusable document
structures, overlays, and variants. They are not Python inheritance. They are a
deterministic way to combine slots such as sections, assertions, snippets,
processors, and style guidance.
This is the P10.7 resolver spike for future class/object-style workflows.
## Model
A class can declare:
- `extends`: parent classes
- `slots`: structured values to contribute
- `merge_policies`: per-slot merge behavior
Example:
```yaml
classes:
base-prd:
slots:
sections:
- Problem
- Decision
enterprise:
extends:
- base-prd
slots:
sections:
- Compliance
merge_policies:
sections: append
```
## Linearization
Multiple inheritance uses a C3-style linearization. That gives us:
- deterministic parent ordering
- monotonic inheritance behavior
- explicit diagnostics for cycles, unknown parents, and inconsistent precedence
The resolved class is merged from base to leaf according to the computed
linearization.
## Merge Policies
Initial policies:
- `replace`
- `append`
- `prepend`
- `deep_merge`
- `error_on_conflict`
Unknown policies and invalid value shapes produce diagnostics.
## CLI
Resolve a class:
```bash
mkt class resolve examples/classes/prd-classes.yaml enterprise-prd
```
JSON/YAML output includes the linearization, merged slots, and diagnostics.
## Extension Boundary
The current resolver does not yet instantiate Markdown documents or inject
snippets. It establishes the deterministic inheritance and merge floor. Later
work can connect resolved slots to contracts, references, processors, and
generation plans.

139
docs/content-references.md Normal file
View File

@@ -0,0 +1,139 @@
# Content References
Date: 2026-05-04
## Purpose
Content references are the first WP-0010 extension layer. They give Markitect a
shared way to name and resolve Markdown content units without changing the
existing parser, query, transform, compose, include, contract, or cache APIs.
The goal is a small resolver that later features can reuse:
- includes can accept references as well as paths
- explode/implode can write manifests with stable unit IDs
- processors can receive typed units and dependency edges
- tangle/weave can address chunks and generated outputs
- cache and access-control backends can index the same IDs
## Reference Syntax
References are compact strings:
```text
path/to/file.md
path/to/file.md#section:introduction
path/to/file.md::sections[heading=Decision]
std:clauses/payment.md
std:clauses/payment.md#payment-terms
std:clauses/payment.md#region:boilerplate
std:clauses/payment.md#tag:legal
#local-section
```
The parts are:
- `namespace:`: optional namespace declared in frontmatter
- `path`: a Markdown file path relative to the current document, or relative to
the namespace target
- `#fragment`: optional unit lookup inside the target document
- `::selector`: optional existing Markitect query selector
Fragments and selectors are mutually exclusive during resolution. Selectors are
delegated to the existing query engine, which keeps this layer small and avoids
inventing a second query language.
## Namespaces
Namespaces live in Markdown frontmatter:
```yaml
---
namespaces:
std: ./standard
product: ../product-docs
---
```
Namespace keys may be written with or without a trailing colon. Namespace values
are string paths. Relative namespace paths resolve under the resolver root. All
resolved file paths must stay inside that root.
## Content Units
The resolver currently emits these unit kinds:
- `document`: full Markdown file
- `section`: heading-led Markdown section
- `heading`: heading line
- existing query kinds such as `frontmatter`, `block`, `metrics`, or `section`
Each unit includes:
- `unit_id`: stable local ID
- `kind`
- `source_path`
- source line span when available
- `name`
- `content_hash`
- raw text
- metadata from the source or query match
Heading and section IDs use an explicit trailing heading ID when present:
```markdown
## Payment Terms {#payment-terms}
```
Otherwise the resolver derives a slug from the heading text and adds numeric
suffixes for collisions.
Named regions use HTML comments so they can live in Markdown and many source
files without changing the rendered document:
```markdown
<!-- mkt:region id="boilerplate" tags="legal reuse" -->
Reusable text.
<!-- /mkt:region -->
```
Fenced blocks can be addressed when their info string includes an ID:
````markdown
```python {#load-config tags="code setup" tangle="src/config.py"}
def load_config():
return {}
```
````
Supported fragments now include:
- `#section:<id-or-heading-slug>`
- `#heading:<id-or-heading-slug>`
- `#region:<id>`
- `#fence:<id>`
- `#tag:<tag>`
- `#line:<start>` or `#line:<start>-<end>`
- `#<id>` as a convenience lookup across sections, regions, fenced blocks, and
headings
## CLI
Resolve a reference from a context document:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md#payment-terms'
```
JSON and YAML formats include the resolved text and metadata:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md::sections[heading=Warranty]' --format json
```
## Extension Boundary
This layer is intentionally read-only. It does not replace `mkt include`,
`mkt query`, or `mkt extract`. Instead it defines the address model those tools
can adopt when their next WP-0010 tasks require richer content identity,
processor dependencies, source maps, and reversible manifests.

69
docs/explode-implode.md Normal file
View File

@@ -0,0 +1,69 @@
# Explode and Implode
Date: 2026-05-04
## Purpose
`mkt explode` and `mkt implode` reintroduce the useful old Markitect
large-document workflow as a slim WP-0010 extension. The design is
manifest-first: the exploded directory is editable, but the manifest preserves
ordering, source spans, heading metadata, hashes, frontmatter, and the selected
layout variant.
This keeps the operation reversible without requiring a database or service.
## Variants
The initial variants are:
- `flat`: writes ordered section files under `sections/`.
- `hierarchical`: writes child section files below parent heading directories.
Both variants preserve the same manifest model. A later semantic variant can
reuse the reference and processor framework once those layers are stable.
## CLI
Explode a document:
```bash
mkt explode docs/source.md --output-dir work/source-exploded
```
Use a hierarchical directory shape:
```bash
mkt explode docs/source.md --output-dir work/source-tree --variant hierarchical
```
Implode the directory back into one Markdown file:
```bash
mkt implode work/source-exploded --output docs/source-rebuilt.md
```
By default `mkt explode` refuses to write into a non-empty output directory. Use
`--force` when an explicit overwrite is intended.
## Manifest
The manifest is written as `markitect-explode.yaml` in the output directory.
It records:
- manifest version
- original source path and SHA-256 hash
- variant
- raw frontmatter block
- ordered entries with file path, kind, unit ID, source line span, heading
metadata, and content hash
Implode reads the manifest entries in order and concatenates the current entry
files. If users edit section files, the rebuilt document reflects those edits
while preserving the original frontmatter and ordering.
## Extension Boundary
This implementation is intentionally not semantic yet. It does not infer
contracts, classes, named chunks, or processor outputs. Instead it establishes a
small reversible substrate that later WP-0010 tasks can enrich with regions,
references, processors, source maps, and weave/tangle behavior.

View File

@@ -0,0 +1,79 @@
# Literate Weave and Tangle
Date: 2026-05-04
## Purpose
The literate workflow layer brings a small Knuth-style weave/tangle capability
to Markdown without requiring a separate language. Prose stays in Markdown.
Named code chunks live in fenced blocks. Tangling emits source files.
Weaving keeps the document readable and adds a deterministic chunk index.
## Chunk Syntax
Named chunks use fenced block attributes:
````markdown
```python {#helpers}
def helper():
return "ready"
```
````
A chunk becomes an output root when it declares `tangle`:
````markdown
```python {#main tangle="src/app.py"}
<<helpers>>
def main():
return helper()
```
````
Chunk references use noweb-style syntax:
```text
<<helpers>>
```
Whole-line chunk references preserve indentation when expanded.
## CLI
Tangle files:
```bash
mkt tangle examples/literate/app.md --output-dir build/literate
```
Inspect without writing:
```bash
mkt tangle examples/literate/app.md --format json
```
Weave documentation:
```bash
mkt weave examples/literate/app.md --output build/app-woven.md
```
## Diagnostics
Tangling reports structured diagnostics for missing chunks and cyclic chunk
references. Tangled files are only written by the CLI when the result is valid.
## Extension Boundary
The MVP deliberately keeps the model narrow:
- named fenced blocks
- `tangle="<path>"`
- deterministic document-order concatenation for repeated targets
- noweb-style chunk expansion
- generated chunk index during weave
Future extensions can add richer source maps, processor execution,
language-specific extraction, and class/namespace-aware chunk selection without
changing this initial chunk model.

View File

@@ -0,0 +1,46 @@
# markitect-main WP-0010 Migration Notes
Date: 2026-05-04
## Purpose
This note captures the relevant `markitect-main` ideas that WP-0010 now
preserves in successor form.
The migration is conceptual rather than source-compatible. The successor keeps
Markdown-native behavior and removes old platform, database, infospace, and
service assumptions.
## Parity Map
| Legacy area | Successor shape | Status |
| --- | --- | --- |
| Explode/implode variants | `mkt explode`, `mkt implode`, manifest-first flat/hierarchical variants | Reimplemented |
| Transclusion/includes | `mkt include` for path markers; processor `mkt-include` for reference-backed content | Reimplemented with clearer boundaries |
| Spaces/infospace references | Frontmatter namespaces plus `mkt ref resolve` | Reframed as syntax-layer references |
| Fenced-block processors | Explicit deterministic processor registry | Reimplemented as opt-in extension |
| Literate workflows | `mkt tangle`, `mkt weave`, named fenced chunks, noweb references | Reimplemented as MVP |
| Content classes/overlays | Data-defined classes with C3-style linearization and merge policies | Resolver spike implemented |
## Intentionally Not Migrated
These old concerns stay out of the WP-0010 toolkit layer:
- database-backed infospace lifecycle
- GraphQL/service APIs
- provider-specific LLM execution
- rendering/plugin/browser/editor infrastructure
- project finance, wishlist, and profile tooling
## Migration Examples
Examples live under `examples/migration/`:
- `legacy-explode-source.md`: large document roundtrip via explode/implode.
- `legacy-transclusion-context.md`: namespace-backed reference include.
- `legacy-path-include.md`: simple path-based include marker.
- `legacy-literate.md`: named chunks tangled into source.
The tests in `tests/test_wp0010_migration_examples.py` exercise these files as
successor fixtures. They are deliberately small, but they lock down the
behaviors we most wanted to keep from `markitect-main`.

81
docs/processors.md Normal file
View File

@@ -0,0 +1,81 @@
# Fenced-Block Processors
Date: 2026-05-04
## Purpose
The processor registry is the deterministic execution boundary for WP-0010.
It lets Markdown fenced blocks opt into named processors while keeping
execution explicit, inspectable, and non-magical.
Processors receive:
- the fenced content unit
- resolver-capable context
- variables and policy maps
Processors return:
- generated content
- optional generated files
- diagnostics
- dependencies
- operation provenance
No built-in processor runs arbitrary code.
## Syntax
A fenced block opts into processing by using an `mkt-<processor>` language:
````markdown
```mkt-uppercase {#shout}
hello
```
````
The processor can also be named with attributes:
````markdown
```markdown {#example processor="identity"}
Rendered as-is by the identity processor.
```
````
## Built-In Processors
Initial deterministic processors:
- `identity`: returns the fenced block content unchanged.
- `uppercase`: returns uppercased content; mainly a registry smoke-test.
- `include`: resolves a `ref` attribute through the content reference resolver.
Reference-backed include:
````markdown
```mkt-include {#payment ref="std:clauses.md#payment-terms"}
```
````
The include processor returns the resolved content, records the target file as
a dependency, and emits operation provenance.
## CLI
Run processors in a document:
```bash
mkt process examples/references/context.md --format json
```
Text output reports processor validity, block IDs, and the first generated
content line. JSON/YAML output includes diagnostics, dependencies, and
provenance.
## Extension Boundary
The registry is deliberately small. It does not render a final document yet and
does not execute shell, Python, SQL, or LLM calls. Those can become opt-in
processors later, but they should use the same result envelope so diagnostics,
dependencies, provenance, cache invalidation, and access-control hooks stay
consistent.

View File

@@ -27,6 +27,10 @@ Supported operations:
The API equivalent is `transform_markdown(...)`.
Heading shifts are token-safe: Markdown fenced and indented code blocks are
left untouched even if their lines look like headings. `TransformResult`
includes structured provenance events alongside the older operation-name list.
## Compose
Use `mkt compose` to concatenate Markdown inputs with predictable separators:
@@ -79,5 +83,12 @@ Resolution rules:
directory.
- Recursive includes are resolved up to `--max-depth`.
- Cycles and missing files fail with explicit errors.
- Include markers inside fenced or indented code blocks are left literal.
The API equivalent is `resolve_includes(...)`.
`IncludeResult` includes structured provenance events. Each include event
records the source marker line when available, the resolved target path,
dependency edge, selector, heading shift, and frontmatter policy. This is the
first provenance envelope used by later WP-0010 processor, source-map, and
explode/implode work.

View File

@@ -32,7 +32,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |

View File

@@ -0,0 +1,30 @@
classes:
base-prd:
slots:
sections:
- Problem
- Decision
assertions:
tone: plain
audience: product
enterprise:
extends:
- base-prd
slots:
sections:
- Compliance
assertions:
audience: enterprise buyers
merge_policies:
sections: append
assertions: deep_merge
enterprise-prd:
extends:
- enterprise
slots:
sections:
- Rollout
merge_policies:
sections: append

15
examples/literate/app.md Normal file
View File

@@ -0,0 +1,15 @@
# Literate App Example
This example explains the helper before showing the application entry point.
```python {#helpers}
def helper():
return "ready"
```
```python {#main tangle="src/app.py"}
<<helpers>>
def main():
return helper()
```

View File

@@ -0,0 +1,17 @@
---
title: Legacy Explode Successor
---
Opening material that used to be easy to lose in section-only exports.
# Overview
The successor explode flow preserves preamble, headings, order, and frontmatter.
## Detail
Nested sections remain addressable and roundtrip through the manifest.
# Follow-Up
Later sections keep their document order.

View File

@@ -0,0 +1,12 @@
# Legacy Literate Successor
```python {#config}
CONFIG = {"ready": True}
```
```python {#main tangle="src/app.py"}
<<config>>
def main():
return CONFIG["ready"]
```

View File

@@ -0,0 +1,3 @@
# Path Include
<!-- mkt:include path="standard/clauses.md" selector="sections[heading~=Warranty]" -->

View File

@@ -0,0 +1,13 @@
---
title: Legacy Transclusion Successor
namespaces:
std: ./standard
---
# Contract Draft
The old broad transclusion idea is now split into path includes and
reference-backed processors.
```mkt-include {#payment-clause ref="std:clauses.md#payment"}
```

View File

@@ -0,0 +1,9 @@
# Standard Clauses
## Payment {#payment}
Payment is due within 30 days.
## Warranty {#warranty}
Warranty begins on the effective date.

View File

@@ -0,0 +1,26 @@
---
title: Reference Context
namespaces:
std: ./standard
---
# Reference Context
This document declares the namespaces used by reference examples.
## Local Overview
Local sections can be addressed with `#local-overview`.
<!-- mkt:region id="summary-snippet" tags="reuse summary" -->
This named region can be resolved with `#region:summary-snippet` or
`#tag:summary`.
<!-- /mkt:region -->
```python {#example-loader tags="code demo" tangle="src/example_loader.py"}
def load_example():
return "ready"
```
```mkt-include {#payment-example ref="std:clauses.md#payment-terms"}
```

View File

@@ -0,0 +1,9 @@
# Standard Clauses
## Payment Terms {#payment-terms}
Payment is due within 30 days unless a governing contract says otherwise.
## Warranty
The warranty period starts on the effective date.

View File

@@ -32,7 +32,26 @@ from markitect_tool.cache import (
save_cache,
scan_markdown_files,
)
from markitect_tool.content_class import (
ClassCompositionResult,
ContentClass,
ContentClassRegistry,
ContentClassResolutionError,
load_content_class_file,
load_content_classes,
)
from markitect_tool.diagnostics import Diagnostic, SourceLocation
from markitect_tool.explode import (
EXPLODE_MANIFEST_NAME,
ExplodeEntry,
ExplodeError,
ExplodeManifest,
ExplodeResult,
ImplodeResult,
explode_markdown_file,
implode_markdown_directory,
load_explode_manifest,
)
from markitect_tool.generation import (
GeneratedDocument,
GenerationHookRequest,
@@ -44,21 +63,55 @@ from markitect_tool.generation import (
load_generation_plan_file,
run_generation_plan,
)
from markitect_tool.literate import (
CodeChunk,
LiterateFile,
TangleResult,
WeaveResult,
discover_code_chunks,
tangle_markdown,
weave_markdown,
write_tangle_files,
)
from markitect_tool.ops import (
ComposeResult,
IncludeError,
IncludeResult,
OperationProvenance,
TransformResult,
compose_files,
resolve_includes,
transform_markdown,
)
from markitect_tool.processor import (
FencedProcessorBlock,
ProcessorContext,
ProcessorOutputFile,
ProcessorRegistry,
ProcessorRequest,
ProcessorResult,
ProcessorRun,
default_processor_registry,
discover_fenced_processors,
run_fenced_processors,
)
from markitect_tool.query import (
InvalidQueryError,
QueryMatch,
extract_document,
query_document,
)
from markitect_tool.reference import (
ContentUnit,
ReferenceAddress,
ReferenceContext,
ReferenceResolution,
ReferenceResolutionError,
SourceSpan as ReferenceSourceSpan,
load_namespaces,
parse_reference,
resolve_reference,
)
from markitect_tool.schema import (
MarkdownSchema,
SchemaValidationResult,
@@ -109,8 +162,23 @@ __all__ = [
"load_cache",
"save_cache",
"scan_markdown_files",
"ClassCompositionResult",
"ContentClass",
"ContentClassRegistry",
"ContentClassResolutionError",
"load_content_class_file",
"load_content_classes",
"Diagnostic",
"SourceLocation",
"EXPLODE_MANIFEST_NAME",
"ExplodeEntry",
"ExplodeError",
"ExplodeManifest",
"ExplodeResult",
"ImplodeResult",
"explode_markdown_file",
"implode_markdown_directory",
"load_explode_manifest",
"GeneratedDocument",
"GenerationHookRequest",
"GenerationHookResult",
@@ -120,17 +188,45 @@ __all__ = [
"generate_with_hook",
"load_generation_plan_file",
"run_generation_plan",
"CodeChunk",
"LiterateFile",
"TangleResult",
"WeaveResult",
"discover_code_chunks",
"tangle_markdown",
"weave_markdown",
"write_tangle_files",
"ComposeResult",
"IncludeError",
"IncludeResult",
"OperationProvenance",
"TransformResult",
"compose_files",
"resolve_includes",
"transform_markdown",
"FencedProcessorBlock",
"ProcessorContext",
"ProcessorOutputFile",
"ProcessorRegistry",
"ProcessorRequest",
"ProcessorResult",
"ProcessorRun",
"default_processor_registry",
"discover_fenced_processors",
"run_fenced_processors",
"InvalidQueryError",
"QueryMatch",
"extract_document",
"query_document",
"ContentUnit",
"ReferenceAddress",
"ReferenceContext",
"ReferenceResolution",
"ReferenceResolutionError",
"ReferenceSourceSpan",
"load_namespaces",
"parse_reference",
"resolve_reference",
"MissingTemplateVariable",
"TemplateAnalysis",
"TemplateError",

View File

@@ -16,6 +16,10 @@ from markitect_tool.cache import (
load_cache,
save_cache,
)
from markitect_tool.content_class import (
ContentClassResolutionError,
load_content_class_file,
)
from markitect_tool.core import parse_markdown_file
from markitect_tool.contract import (
ContractLoaderError,
@@ -24,6 +28,11 @@ from markitect_tool.contract import (
load_contract_file,
validate_contract,
)
from markitect_tool.explode import (
ExplodeError,
explode_markdown_file,
implode_markdown_directory,
)
from markitect_tool.generation import (
GenerationPlanError,
generate_stub_from_contract,
@@ -31,8 +40,16 @@ from markitect_tool.generation import (
load_generation_plan_file,
run_generation_plan,
)
from markitect_tool.literate import tangle_markdown, weave_markdown, write_tangle_files
from markitect_tool.ops import IncludeError, compose_files, resolve_includes, transform_markdown
from markitect_tool.processor import ProcessorContext, run_fenced_processors
from markitect_tool.query import InvalidQueryError, extract_document, query_document
from markitect_tool.reference import (
ReferenceContext,
ReferenceResolutionError,
load_namespaces,
resolve_reference,
)
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
from markitect_tool.template import (
MissingTemplateVariable,
@@ -296,6 +313,224 @@ def include(
_emit_markdown_result(result.to_dict(), output_format, output)
@main.command()
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--output-dir",
required=True,
type=click.Path(file_okay=False, path_type=Path),
help="Directory to write exploded Markdown files and manifest into.",
)
@click.option(
"--variant",
type=click.Choice(["flat", "hierarchical"], case_sensitive=False),
default="flat",
show_default=True,
)
@click.option("--force", is_flag=True, help="Allow writing into a non-empty output directory.")
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def explode(
file: Path,
output_dir: Path,
variant: str,
force: bool,
output_format: str,
) -> None:
"""Explode a Markdown file into reversible section files."""
try:
result = explode_markdown_file(file, output_dir, variant=variant, overwrite=force)
except ExplodeError as exc:
raise click.ClickException(str(exc)) from exc
_emit_explode_result(result.to_dict(), output_format)
@main.command()
@click.argument("directory", type=click.Path(exists=True, file_okay=False, path_type=Path))
@click.option(
"--manifest",
"manifest_path",
type=click.Path(exists=True, dir_okay=False, path_type=Path),
help="Manifest path. Defaults to markitect-explode.yaml in the input directory.",
)
@click.option(
"--output",
type=click.Path(dir_okay=False, path_type=Path),
help="Write imploded Markdown to a file.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
default="markdown",
show_default=True,
)
def implode(
directory: Path,
manifest_path: Path | None,
output: Path | None,
output_format: str,
) -> None:
"""Implode a Markdown directory created by `mkt explode`."""
try:
result = implode_markdown_directory(directory, manifest_path=manifest_path)
except ExplodeError as exc:
raise click.ClickException(str(exc)) from exc
_emit_markdown_result(result.to_dict(), output_format, output)
@main.group("ref")
def ref_group() -> None:
"""Resolve namespaced Markdown content references."""
@ref_group.command("resolve")
@click.argument("context_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.argument("reference")
@click.option(
"--root",
type=click.Path(exists=True, file_okay=False, path_type=Path),
default=Path("."),
show_default=True,
help="Root that relative paths and namespaces must stay within.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def ref_resolve(context_file: Path, reference: str, root: Path, output_format: str) -> None:
"""Resolve a content reference using a Markdown document as context."""
context_document = parse_markdown_file(context_file)
context = ReferenceContext.from_document(
context_document,
root=root,
current_path=context_file,
)
try:
resolution = resolve_reference(reference, context=context)
except ReferenceResolutionError as exc:
raise click.ClickException(str(exc)) from exc
_emit_reference_result(resolution.to_dict(), output_format)
@main.command("process")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--root",
type=click.Path(exists=True, file_okay=False, path_type=Path),
default=Path("."),
show_default=True,
help="Root used for relative processor references.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def process(file: Path, root: Path, output_format: str) -> None:
"""Run deterministic fenced-block processors in a Markdown file."""
document = parse_markdown_file(file)
context = ProcessorContext(
root=root,
current_path=file,
namespaces=load_namespaces(document.frontmatter),
)
result = run_fenced_processors(
file.read_text(encoding="utf-8"),
context=context,
source_path=file,
)
_emit_processor_run(result.to_dict(), output_format)
raise click.exceptions.Exit(0 if result.valid else 1)
@main.group("class")
def class_group() -> None:
"""Resolve deterministic content classes."""
@class_group.command("resolve")
@click.argument("class_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.argument("class_name")
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def class_resolve(class_file: Path, class_name: str, output_format: str) -> None:
"""Resolve content class inheritance and merged slots."""
try:
registry = load_content_class_file(class_file)
result = registry.compose(class_name)
except ContentClassResolutionError as exc:
raise click.ClickException(str(exc)) from exc
_emit_content_class_result(result.to_dict(), output_format)
raise click.exceptions.Exit(0 if result.valid else 1)
@main.command()
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--output-dir",
type=click.Path(file_okay=False, path_type=Path),
help="Write tangled files under this directory. Omit for dry JSON/YAML/text output.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def tangle(file: Path, output_dir: Path | None, output_format: str) -> None:
"""Tangle named Markdown code chunks into target files."""
result = tangle_markdown(file.read_text(encoding="utf-8"), source_path=file)
data = result.to_dict()
if output_dir and result.valid:
data["written_files"] = write_tangle_files(result, output_dir)
_emit_tangle_result(data, output_format)
raise click.exceptions.Exit(0 if result.valid else 1)
@main.command()
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--output",
type=click.Path(dir_okay=False, path_type=Path),
help="Write woven Markdown to a file.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
default="markdown",
show_default=True,
)
def weave(file: Path, output: Path | None, output_format: str) -> None:
"""Weave Markdown documentation with a deterministic chunk index."""
result = weave_markdown(file.read_text(encoding="utf-8"), source_path=file)
_emit_markdown_result(result.to_dict(), output_format, output)
@main.group()
def cache() -> None:
"""Fingerprint Markdown files and detect changed inputs."""
@@ -788,6 +1023,83 @@ def _emit_cache_data(data: dict, output_format: str) -> None:
click.echo(f"written: {data['written']}")
def _emit_reference_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo(f"{data['count']} unit(s)")
click.echo(f"target: {data['target_path']}")
for unit in data["units"]:
span = unit.get("span", {})
line = f":{span['line_start']}" if span.get("line_start") else ""
click.echo(f"- {unit['kind']} {unit['unit_id']} {unit['source_path']}{line}")
if unit.get("name"):
click.echo(f" {unit['name']}")
def _emit_explode_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
manifest = data["manifest"]
click.echo(f"manifest: {data['manifest_path']}")
click.echo(f"variant: {manifest['variant']}")
click.echo(f"entries: {len(manifest['entries'])}")
for entry in manifest["entries"]:
click.echo(f"- {entry['kind']} {entry['file']}")
def _emit_processor_run(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo("valid" if data["valid"] else "invalid")
click.echo(f"processors: {data['count']}")
for block, result in zip(data["blocks"], data["results"], strict=False):
line = f":{block['line_start']}" if block.get("line_start") else ""
click.echo(f"- {block['processor']} {block['unit_id']}{line}")
if result.get("content"):
click.echo(f" content: {result['content'].splitlines()[0]}")
for diagnostic in result.get("diagnostics", []):
click.echo(f" [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
def _emit_content_class_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo("valid" if data["valid"] else "invalid")
click.echo("linearization: " + " -> ".join(data["linearization"]))
for slot, value in data.get("slots", {}).items():
click.echo(f"- {slot}: {value}")
for diagnostic in data.get("diagnostics", []):
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
def _emit_tangle_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo("valid" if data["valid"] else "invalid")
click.echo(f"files: {len(data['files'])}")
for file in data["files"]:
click.echo(f"- {file['path']}: {', '.join(file['chunk_ids'])}")
for diagnostic in data.get("diagnostics", []):
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
for written in data.get("written_files", []):
click.echo(f"written: {written}")
def _emit_jsonish(data: dict, output_format: str) -> None:
if output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))

View File

@@ -0,0 +1,19 @@
"""Deterministic content class composition."""
from markitect_tool.content_class.engine import (
ClassCompositionResult,
ContentClass,
ContentClassRegistry,
ContentClassResolutionError,
load_content_class_file,
load_content_classes,
)
__all__ = [
"ClassCompositionResult",
"ContentClass",
"ContentClassRegistry",
"ContentClassResolutionError",
"load_content_class_file",
"load_content_classes",
]

View File

@@ -0,0 +1,225 @@
"""Small deterministic content class resolver."""
from __future__ import annotations
from copy import deepcopy
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any
import yaml
from markitect_tool.diagnostics import Diagnostic
class ContentClassResolutionError(ValueError):
"""Raised when content class definitions cannot be loaded."""
@dataclass(frozen=True)
class ContentClass:
"""A data-defined content class."""
name: str
extends: list[str] = field(default_factory=list)
slots: dict[str, Any] = field(default_factory=dict)
merge_policies: dict[str, str] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
return {key: value for key, value in asdict(self).items() if value not in ({}, [], None)}
@dataclass(frozen=True)
class ClassCompositionResult:
"""Resolved content class slots plus diagnostics."""
class_name: str
linearization: list[str]
slots: dict[str, Any]
diagnostics: list[Diagnostic] = field(default_factory=list)
@property
def valid(self) -> bool:
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
def to_dict(self) -> dict[str, Any]:
return {
"valid": self.valid,
"class_name": self.class_name,
"linearization": self.linearization,
"slots": self.slots,
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
}
class ContentClassRegistry:
"""Registry and resolver for content classes."""
def __init__(self, classes: dict[str, ContentClass] | None = None) -> None:
self.classes = classes or {}
def add(self, content_class: ContentClass) -> None:
self.classes[content_class.name] = content_class
def linearize(self, class_name: str) -> list[str]:
if class_name not in self.classes:
raise ContentClassResolutionError(f"Unknown content class `{class_name}`")
return self._linearize(class_name, [])
def compose(self, class_name: str) -> ClassCompositionResult:
diagnostics: list[Diagnostic] = []
try:
linearization = self.linearize(class_name)
except ContentClassResolutionError as exc:
return ClassCompositionResult(
class_name=class_name,
linearization=[],
slots={},
diagnostics=[
Diagnostic(
severity="error",
code="content_class.resolution_error",
message=str(exc),
)
],
)
slots: dict[str, Any] = {}
for name in reversed(linearization):
content_class = self.classes[name]
for slot, value in content_class.slots.items():
policy = content_class.merge_policies.get(slot, "replace")
try:
slots[slot] = _merge_slot(slots.get(slot), value, policy)
except ContentClassResolutionError as exc:
diagnostics.append(
Diagnostic(
severity="error",
code="content_class.merge_conflict",
message=str(exc),
details={"class": name, "slot": slot, "policy": policy},
)
)
return ClassCompositionResult(
class_name=class_name,
linearization=linearization,
slots=slots,
diagnostics=diagnostics,
)
def _linearize(self, class_name: str, stack: list[str]) -> list[str]:
if class_name in stack:
raise ContentClassResolutionError(
"Cyclic content class inheritance: " + " -> ".join(stack + [class_name])
)
content_class = self.classes[class_name]
parent_mros = [
self._linearize(parent, stack + [class_name])
for parent in content_class.extends
if _known_parent(parent, self.classes)
]
missing = [parent for parent in content_class.extends if parent not in self.classes]
if missing:
raise ContentClassResolutionError(
f"Content class `{class_name}` extends unknown class(es): {', '.join(missing)}"
)
return [class_name] + _c3_merge(parent_mros + [list(content_class.extends)])
def load_content_class_file(path: str | Path) -> ContentClassRegistry:
"""Load content class definitions from YAML."""
data = yaml.safe_load(Path(path).read_text(encoding="utf-8"))
if not isinstance(data, dict):
raise ContentClassResolutionError("Content class file must be a mapping")
return load_content_classes(data)
def load_content_classes(data: dict[str, Any]) -> ContentClassRegistry:
"""Load content class definitions from a mapping."""
raw_classes = data.get("classes", data)
if not isinstance(raw_classes, dict):
raise ContentClassResolutionError("Content classes must be a mapping")
classes: dict[str, ContentClass] = {}
for name, raw_class in raw_classes.items():
if not isinstance(raw_class, dict):
raise ContentClassResolutionError(f"Content class `{name}` must be a mapping")
extends = raw_class.get("extends", [])
if isinstance(extends, str):
extends = [extends]
if not isinstance(extends, list):
raise ContentClassResolutionError(f"Content class `{name}` extends must be a list")
slots = raw_class.get("slots", {})
policies = raw_class.get("merge_policies", {})
if not isinstance(slots, dict) or not isinstance(policies, dict):
raise ContentClassResolutionError(
f"Content class `{name}` slots and merge_policies must be mappings"
)
classes[str(name)] = ContentClass(
name=str(name),
extends=[str(parent) for parent in extends],
slots=slots,
merge_policies={str(key): str(value) for key, value in policies.items()},
)
return ContentClassRegistry(classes)
def _c3_merge(sequences: list[list[str]]) -> list[str]:
result: list[str] = []
sequences = [list(sequence) for sequence in sequences if sequence]
while sequences:
candidate = None
for sequence in sequences:
head = sequence[0]
if not any(head in other[1:] for other in sequences):
candidate = head
break
if candidate is None:
raise ContentClassResolutionError("Inconsistent content class precedence order")
result.append(candidate)
sequences = [
[item for item in sequence if item != candidate]
for sequence in sequences
]
sequences = [sequence for sequence in sequences if sequence]
return result
def _merge_slot(existing: Any, value: Any, policy: str) -> Any:
incoming = deepcopy(value)
if existing is None:
return incoming
if policy == "replace":
return incoming
if policy == "append":
return _as_list(existing) + _as_list(incoming)
if policy == "prepend":
return _as_list(incoming) + _as_list(existing)
if policy == "deep_merge":
if not isinstance(existing, dict) or not isinstance(incoming, dict):
raise ContentClassResolutionError("deep_merge requires mapping values")
return _deep_merge(existing, incoming)
if policy == "error_on_conflict":
if existing != incoming:
raise ContentClassResolutionError("slot conflict")
return existing
raise ContentClassResolutionError(f"Unknown merge policy `{policy}`")
def _deep_merge(left: dict[str, Any], right: dict[str, Any]) -> dict[str, Any]:
merged = deepcopy(left)
for key, value in right.items():
if isinstance(merged.get(key), dict) and isinstance(value, dict):
merged[key] = _deep_merge(merged[key], value)
else:
merged[key] = deepcopy(value)
return merged
def _as_list(value: Any) -> list[Any]:
return value if isinstance(value, list) else [value]
def _known_parent(parent: str, classes: dict[str, ContentClass]) -> bool:
return parent in classes

View File

@@ -0,0 +1,25 @@
"""Reversible explode/implode operations for Markdown documents."""
from markitect_tool.explode.engine import (
EXPLODE_MANIFEST_NAME,
ExplodeEntry,
ExplodeError,
ExplodeManifest,
ExplodeResult,
ImplodeResult,
explode_markdown_file,
implode_markdown_directory,
load_explode_manifest,
)
__all__ = [
"EXPLODE_MANIFEST_NAME",
"ExplodeEntry",
"ExplodeError",
"ExplodeManifest",
"ExplodeResult",
"ImplodeResult",
"explode_markdown_file",
"implode_markdown_directory",
"load_explode_manifest",
]

View File

@@ -0,0 +1,324 @@
"""Manifest-first reversible explode/implode for Markdown files."""
from __future__ import annotations
import hashlib
import re
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any
import yaml
from markitect_tool.core import Heading, parse_markdown
EXPLODE_MANIFEST_NAME = "markitect-explode.yaml"
class ExplodeError(ValueError):
"""Raised when explode or implode cannot preserve a safe roundtrip."""
@dataclass(frozen=True)
class ExplodeEntry:
"""One file entry in an exploded Markdown directory."""
kind: str
file: str
order: int
unit_id: str
line_start: int
line_end: int
heading_level: int | None = None
heading_text: str | None = None
content_hash: str = ""
def to_dict(self) -> dict[str, Any]:
return {key: value for key, value in asdict(self).items() if value is not None}
@dataclass(frozen=True)
class ExplodeManifest:
"""Manifest used to implode an exploded Markdown directory."""
version: int
source_path: str
source_hash: str
variant: str
frontmatter_raw: str = ""
entries: list[ExplodeEntry] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
return {
"version": self.version,
"source_path": self.source_path,
"source_hash": self.source_hash,
"variant": self.variant,
"frontmatter_raw": self.frontmatter_raw,
"entries": [entry.to_dict() for entry in self.entries],
}
@dataclass(frozen=True)
class ExplodeResult:
"""Result of exploding a Markdown file into a directory."""
manifest_path: str
output_dir: str
manifest: ExplodeManifest
written_files: list[str]
def to_dict(self) -> dict[str, Any]:
return {
"manifest_path": self.manifest_path,
"output_dir": self.output_dir,
"manifest": self.manifest.to_dict(),
"written_files": self.written_files,
}
@dataclass(frozen=True)
class ImplodeResult:
"""Result of rebuilding Markdown from an explode manifest."""
markdown: str
manifest_path: str
source_hash: str
current_hash: str
entries: list[str]
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def explode_markdown_file(
path: str | Path,
output_dir: str | Path,
*,
variant: str = "flat",
overwrite: bool = False,
) -> ExplodeResult:
"""Explode a Markdown file into section files plus a roundtrip manifest."""
if variant not in {"flat", "hierarchical"}:
raise ExplodeError("Explode variant must be `flat` or `hierarchical`")
source_path = Path(path)
target_dir = Path(output_dir)
markdown = source_path.read_text(encoding="utf-8")
if target_dir.exists() and any(target_dir.iterdir()) and not overwrite:
raise ExplodeError(f"Output directory is not empty: {target_dir}")
target_dir.mkdir(parents=True, exist_ok=True)
frontmatter_raw, body_start_line = _split_frontmatter_raw(markdown)
entries_with_text = _explode_entries(markdown, body_start_line, variant)
written_files: list[str] = []
entries: list[ExplodeEntry] = []
for entry, text in entries_with_text:
entry_path = _safe_entry_path(target_dir, entry.file)
entry_path.parent.mkdir(parents=True, exist_ok=True)
entry_path.write_text(text, encoding="utf-8")
written_files.append(str(entry_path))
entries.append(entry)
manifest = ExplodeManifest(
version=1,
source_path=str(source_path),
source_hash=_hash_text(markdown),
variant=variant,
frontmatter_raw=frontmatter_raw,
entries=entries,
)
manifest_path = target_dir / EXPLODE_MANIFEST_NAME
manifest_path.write_text(yaml.safe_dump(manifest.to_dict(), sort_keys=False), encoding="utf-8")
return ExplodeResult(
manifest_path=str(manifest_path),
output_dir=str(target_dir),
manifest=manifest,
written_files=written_files + [str(manifest_path)],
)
def implode_markdown_directory(
directory: str | Path,
*,
manifest_path: str | Path | None = None,
) -> ImplodeResult:
"""Implode a Markdown directory created by :func:`explode_markdown_file`."""
root = Path(directory)
manifest_file = Path(manifest_path) if manifest_path else root / EXPLODE_MANIFEST_NAME
manifest = load_explode_manifest(manifest_file)
parts = [manifest.frontmatter_raw]
entry_files: list[str] = []
for entry in manifest.entries:
entry_path = _safe_entry_path(root, entry.file)
if not entry_path.exists() or not entry_path.is_file():
raise ExplodeError(f"Exploded entry file not found: {entry.file}")
parts.append(entry_path.read_text(encoding="utf-8"))
entry_files.append(str(entry_path))
markdown = "".join(parts)
return ImplodeResult(
markdown=markdown,
manifest_path=str(manifest_file),
source_hash=manifest.source_hash,
current_hash=_hash_text(markdown),
entries=entry_files,
)
def load_explode_manifest(path: str | Path) -> ExplodeManifest:
"""Load an explode manifest from YAML."""
manifest_path = Path(path)
data = yaml.safe_load(manifest_path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
raise ExplodeError("Explode manifest must be a mapping")
entries = data.get("entries", [])
if not isinstance(entries, list):
raise ExplodeError("Explode manifest entries must be a list")
return ExplodeManifest(
version=int(data.get("version", 1)),
source_path=str(data.get("source_path", "")),
source_hash=str(data.get("source_hash", "")),
variant=str(data.get("variant", "flat")),
frontmatter_raw=str(data.get("frontmatter_raw", "")),
entries=[_entry_from_mapping(entry) for entry in entries],
)
def _explode_entries(
markdown: str,
body_start_line: int,
variant: str,
) -> list[tuple[ExplodeEntry, str]]:
lines = markdown.splitlines(keepends=True)
headings = parse_markdown(markdown).headings
entries: list[tuple[ExplodeEntry, str]] = []
used_ids: dict[str, int] = {}
order = 0
first_heading_line = headings[0].line if headings else len(lines) + 1
preamble_text = "".join(lines[body_start_line - 1:first_heading_line - 1])
if preamble_text or not headings:
entry = ExplodeEntry(
kind="preamble",
file="00-preamble.md",
order=order,
unit_id="preamble",
line_start=body_start_line,
line_end=max(first_heading_line - 1, body_start_line),
content_hash=_hash_text(preamble_text),
)
entries.append((entry, preamble_text))
order += 1
hierarchy: dict[int, str] = {}
for index, heading in enumerate(headings):
start = heading.line
end = headings[index + 1].line - 1 if index + 1 < len(headings) else len(lines)
text = "".join(lines[start - 1:end])
unit_id = _dedupe_id(_slug(_heading_title(heading)), used_ids)
file_path = _entry_file_for_heading(heading, index + 1, unit_id, variant, hierarchy)
entry = ExplodeEntry(
kind="section",
file=file_path,
order=order,
unit_id=unit_id,
line_start=start,
line_end=end,
heading_level=heading.level,
heading_text=heading.text,
content_hash=_hash_text(text),
)
entries.append((entry, text))
order += 1
return entries
def _entry_file_for_heading(
heading: Heading,
index: int,
unit_id: str,
variant: str,
hierarchy: dict[int, str],
) -> str:
filename = f"{index:02d}-{unit_id}.md"
if variant == "flat":
return f"sections/{filename}"
for level in list(hierarchy):
if level >= heading.level:
del hierarchy[level]
parents = [hierarchy[level] for level in sorted(hierarchy) if level < heading.level]
hierarchy[heading.level] = f"{index:02d}-{unit_id}"
return str(Path(*parents, filename)) if parents else filename
def _entry_from_mapping(data: Any) -> ExplodeEntry:
if not isinstance(data, dict):
raise ExplodeError("Explode manifest entry must be a mapping")
return ExplodeEntry(
kind=str(data["kind"]),
file=str(data["file"]),
order=int(data["order"]),
unit_id=str(data["unit_id"]),
line_start=int(data["line_start"]),
line_end=int(data["line_end"]),
heading_level=int(data["heading_level"]) if data.get("heading_level") is not None else None,
heading_text=str(data["heading_text"]) if data.get("heading_text") is not None else None,
content_hash=str(data.get("content_hash", "")),
)
def _safe_entry_path(root: Path, relative_path: str) -> Path:
path = Path(relative_path)
if path.is_absolute():
raise ExplodeError(f"Exploded entry path must be relative: {relative_path}")
resolved = (root / path).resolve()
try:
resolved.relative_to(root.resolve())
except ValueError as exc:
raise ExplodeError(f"Exploded entry path escapes directory: {relative_path}") from exc
return resolved
def _split_frontmatter_raw(markdown: str) -> tuple[str, int]:
if not markdown.startswith("---\n"):
return "", 1
end = markdown.find("\n---", 4)
if end == -1:
return "", 1
closing_end = markdown.find("\n", end + 4)
if closing_end == -1:
closing_end = len(markdown)
else:
closing_end += 1
frontmatter_raw = markdown[:closing_end]
return frontmatter_raw, frontmatter_raw.count("\n") + 1
def _heading_title(heading: Heading) -> str:
text = re.sub(r"\s+\{#[A-Za-z0-9_.:-]+\}\s*$", "", heading.text.strip())
return text or "section"
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
count = used_ids.get(unit_id, 0) + 1
used_ids[unit_id] = count
return unit_id if count == 1 else f"{unit_id}-{count}"
def _slug(value: str) -> str:
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
slug = re.sub(r"-+", "-", slug).strip("-")
return slug or "section"
def _hash_text(text: str) -> str:
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()

View File

@@ -0,0 +1,23 @@
"""Markdown-native literate weave/tangle workflows."""
from markitect_tool.literate.engine import (
CodeChunk,
LiterateFile,
TangleResult,
WeaveResult,
discover_code_chunks,
tangle_markdown,
weave_markdown,
write_tangle_files,
)
__all__ = [
"CodeChunk",
"LiterateFile",
"TangleResult",
"WeaveResult",
"discover_code_chunks",
"tangle_markdown",
"weave_markdown",
"write_tangle_files",
]

View File

@@ -0,0 +1,317 @@
"""Literate programming helpers for Markdown fenced code chunks."""
from __future__ import annotations
import hashlib
import re
import shlex
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any
from markdown_it import MarkdownIt
from markitect_tool.diagnostics import Diagnostic, SourceLocation
from markitect_tool.ops import OperationProvenance
@dataclass(frozen=True)
class CodeChunk:
"""A named fenced code chunk."""
chunk_id: str
content: str
language: str | None = None
target_path: str | None = None
references: list[str] = field(default_factory=list)
source_path: str | None = None
line_start: int | None = None
line_end: int | None = None
content_hash: str = ""
def to_dict(self) -> dict[str, Any]:
return {key: value for key, value in asdict(self).items() if value not in (None, [], "")}
@dataclass(frozen=True)
class LiterateFile:
"""One generated file from tangling."""
path: str
content: str
chunk_ids: list[str]
def to_dict(self) -> dict[str, Any]:
return asdict(self)
@dataclass(frozen=True)
class TangleResult:
"""Result of tangling Markdown code chunks."""
files: list[LiterateFile]
chunks: list[CodeChunk]
diagnostics: list[Diagnostic] = field(default_factory=list)
provenance: list[OperationProvenance] = field(default_factory=list)
@property
def valid(self) -> bool:
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
def to_dict(self) -> dict[str, Any]:
return {
"valid": self.valid,
"files": [file.to_dict() for file in self.files],
"chunks": [chunk.to_dict() for chunk in self.chunks],
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
"provenance": [event.to_dict() for event in self.provenance],
}
@dataclass(frozen=True)
class WeaveResult:
"""Result of weaving Markdown documentation with a chunk index."""
markdown: str
chunks: list[CodeChunk]
def to_dict(self) -> dict[str, Any]:
return {
"markdown": self.markdown,
"chunks": [chunk.to_dict() for chunk in self.chunks],
}
_CHUNK_REF_RE = re.compile(r"<<(?P<id>[A-Za-z0-9_.:-]+)>>")
_CHUNK_LINE_REF_RE = re.compile(r"^(?P<indent>[ \t]*)<<(?P<id>[A-Za-z0-9_.:-]+)>>[ \t]*$", re.MULTILINE)
def discover_code_chunks(
markdown: str,
*,
source_path: str | Path | None = None,
) -> list[CodeChunk]:
"""Discover named fenced code chunks in Markdown order."""
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
chunks: list[CodeChunk] = []
used_ids: dict[str, int] = {}
for token in parser.parse(markdown):
if token.type != "fence":
continue
attrs = _parse_fence_info(token.info)
chunk_id = attrs.get("id")
if not chunk_id:
continue
chunk_id = _dedupe_id(_slug(chunk_id), used_ids)
line_start = token.map[0] + 1 if token.map else None
line_end = token.map[1] if token.map else None
chunks.append(
CodeChunk(
chunk_id=chunk_id,
content=token.content,
language=attrs.get("language"),
target_path=attrs.get("tangle") or attrs.get("target"),
references=_chunk_references(token.content),
source_path=str(source_path) if source_path else None,
line_start=line_start,
line_end=line_end,
content_hash=_hash_text(token.content),
)
)
return chunks
def tangle_markdown(
markdown: str,
*,
source_path: str | Path | None = None,
) -> TangleResult:
"""Tangle named chunks into target files."""
chunks = discover_code_chunks(markdown, source_path=source_path)
chunks_by_id = {chunk.chunk_id: chunk for chunk in chunks}
diagnostics: list[Diagnostic] = []
provenance: list[OperationProvenance] = []
target_chunks: dict[str, list[CodeChunk]] = {}
for chunk in chunks:
if chunk.target_path:
target_chunks.setdefault(chunk.target_path, []).append(chunk)
files: list[LiterateFile] = []
for target_path, grouped_chunks in target_chunks.items():
rendered_parts: list[str] = []
for chunk in grouped_chunks:
rendered_parts.append(_expand_chunk(chunk, chunks_by_id, diagnostics, []))
provenance.append(
OperationProvenance(
operation="literate.tangle",
source_path=chunk.source_path,
line_start=chunk.line_start,
line_end=chunk.line_end,
target_path=target_path,
dependencies=[chunk.source_path] if chunk.source_path else [],
metadata={"chunk_id": chunk.chunk_id, "references": chunk.references},
)
)
files.append(
LiterateFile(
path=target_path,
content=_join_tangled_parts(rendered_parts),
chunk_ids=[chunk.chunk_id for chunk in grouped_chunks],
)
)
return TangleResult(
files=files,
chunks=chunks,
diagnostics=diagnostics,
provenance=provenance,
)
def weave_markdown(
markdown: str,
*,
source_path: str | Path | None = None,
) -> WeaveResult:
"""Append a deterministic chunk index to human-readable Markdown."""
chunks = discover_code_chunks(markdown, source_path=source_path)
if not chunks:
return WeaveResult(markdown=markdown, chunks=[])
lines = [markdown.rstrip(), "", "## Code Chunk Index", ""]
for chunk in chunks:
target = f" -> `{chunk.target_path}`" if chunk.target_path else ""
refs = f"; refs: {', '.join(f'`{ref}`' for ref in chunk.references)}" if chunk.references else ""
location = f" line {chunk.line_start}" if chunk.line_start else ""
lines.append(f"- `{chunk.chunk_id}`{target}{refs}{location}")
return WeaveResult(markdown="\n".join(lines).rstrip() + "\n", chunks=chunks)
def write_tangle_files(result: TangleResult, output_dir: str | Path) -> list[str]:
"""Write tangled files under an output directory."""
root = Path(output_dir)
root.mkdir(parents=True, exist_ok=True)
written: list[str] = []
for file in result.files:
target = _safe_output_path(root, file.path)
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(file.content, encoding="utf-8")
written.append(str(target))
return written
def _expand_chunk(
chunk: CodeChunk,
chunks_by_id: dict[str, CodeChunk],
diagnostics: list[Diagnostic],
stack: list[str],
) -> str:
if chunk.chunk_id in stack:
diagnostics.append(
Diagnostic(
severity="error",
code="literate.chunk_cycle",
message="Cyclic chunk reference: " + " -> ".join(stack + [chunk.chunk_id]),
source=SourceLocation(path=chunk.source_path, line=chunk.line_start),
)
)
return f"<<{chunk.chunk_id}>>"
def replace_line(match: re.Match[str]) -> str:
indent = match.group("indent")
expanded = _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
return "\n".join(f"{indent}{line}" if line else line for line in expanded.splitlines())
rendered = _CHUNK_LINE_REF_RE.sub(replace_line, chunk.content)
def replace_inline(match: re.Match[str]) -> str:
return _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
return _CHUNK_REF_RE.sub(replace_inline, rendered)
def _expand_reference(
chunk_id: str,
chunks_by_id: dict[str, CodeChunk],
diagnostics: list[Diagnostic],
stack: list[str],
source_chunk: CodeChunk,
) -> str:
referenced = chunks_by_id.get(chunk_id)
if not referenced:
diagnostics.append(
Diagnostic(
severity="error",
code="literate.missing_chunk",
message=f"Missing chunk reference `{chunk_id}`",
source=SourceLocation(path=source_chunk.source_path, line=source_chunk.line_start),
)
)
return f"<<{chunk_id}>>"
return _expand_chunk(referenced, chunks_by_id, diagnostics, stack)
def _join_tangled_parts(parts: list[str]) -> str:
rendered = "\n".join(part.rstrip("\n") for part in parts if part is not None)
return rendered.rstrip() + "\n" if rendered else ""
def _safe_output_path(root: Path, relative_path: str) -> Path:
path = Path(relative_path)
if path.is_absolute():
raise ValueError(f"Tangle target must be relative: {relative_path}")
resolved = (root / path).resolve()
try:
resolved.relative_to(root.resolve())
except ValueError as exc:
raise ValueError(f"Tangle target escapes output directory: {relative_path}") from exc
return resolved
def _parse_fence_info(info: str) -> dict[str, str]:
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
if not match:
return {"language": info.strip()} if info.strip() else {}
attrs = _parse_attrs(match.group("attrs") or "")
language = match.group("language")
if language:
attrs["language"] = language
return attrs
def _parse_attrs(raw: str) -> dict[str, str]:
attrs: dict[str, str] = {}
for part in shlex.split(raw):
if part.startswith("#") and len(part) > 1:
attrs["id"] = part[1:]
continue
if "=" not in part:
attrs[part] = "true"
continue
key, value = part.split("=", 1)
attrs[key.strip()] = value.strip()
return attrs
def _chunk_references(content: str) -> list[str]:
return [match.group("id") for match in _CHUNK_REF_RE.finditer(content)]
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
count = used_ids.get(unit_id, 0) + 1
used_ids[unit_id] = count
return unit_id if count == 1 else f"{unit_id}-{count}"
def _slug(value: str) -> str:
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
slug = re.sub(r"-+", "-", slug).strip("-")
return slug or "chunk"
def _hash_text(text: str) -> str:
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()

View File

@@ -4,6 +4,7 @@ from markitect_tool.ops.engine import (
ComposeResult,
IncludeError,
IncludeResult,
OperationProvenance,
TransformResult,
compose_files,
resolve_includes,
@@ -14,6 +15,7 @@ __all__ = [
"ComposeResult",
"IncludeError",
"IncludeResult",
"OperationProvenance",
"TransformResult",
"compose_files",
"resolve_includes",

View File

@@ -9,6 +9,7 @@ from pathlib import Path
from typing import Any
import yaml
from markdown_it import MarkdownIt
from markitect_tool.core import parse_markdown
from markitect_tool.query import extract_document
@@ -18,15 +19,46 @@ class IncludeError(ValueError):
"""Raised when include resolution cannot continue."""
@dataclass(frozen=True)
class OperationProvenance:
"""Structured provenance for deterministic Markdown operations."""
operation: str
source_path: str | None = None
line_start: int | None = None
line_end: int | None = None
target_path: str | None = None
dependencies: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
data = {
"operation": self.operation,
"source_path": self.source_path,
"line_start": self.line_start,
"line_end": self.line_end,
"target_path": self.target_path,
"dependencies": self.dependencies or None,
"metadata": self.metadata or None,
}
return {key: value for key, value in data.items() if value is not None}
@dataclass(frozen=True)
class TransformResult:
"""Result of a deterministic Markdown transform."""
markdown: str
operations: list[str] = field(default_factory=list)
provenance: list[OperationProvenance] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
return asdict(self)
data: dict[str, Any] = {
"markdown": self.markdown,
"operations": self.operations,
"provenance": [event.to_dict() for event in self.provenance],
}
return {key: value for key, value in data.items() if value}
@dataclass(frozen=True)
@@ -46,9 +78,15 @@ class IncludeResult:
markdown: str
included_paths: list[str] = field(default_factory=list)
provenance: list[OperationProvenance] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
return asdict(self)
data: dict[str, Any] = {
"markdown": self.markdown,
"included_paths": self.included_paths,
"provenance": [event.to_dict() for event in self.provenance],
}
return {key: value for key, value in data.items() if value}
_COMMENT_INCLUDE_RE = re.compile(r"<!--\s*mkt:include\s+(?P<attrs>.*?)\s*-->", re.DOTALL)
@@ -68,15 +106,30 @@ def transform_markdown(
"""Apply deterministic operations to one Markdown document."""
operations: list[str] = []
provenance: list[OperationProvenance] = []
frontmatter, body = _split_frontmatter(markdown)
if set_frontmatter:
frontmatter = _deep_merge(frontmatter, set_frontmatter)
operations.append("set_frontmatter")
provenance.append(
OperationProvenance(
operation="set_frontmatter",
source_path=source_path,
metadata={"keys": sorted(set_frontmatter.keys())},
)
)
if heading_delta:
body = shift_heading_levels(body, heading_delta)
body, affected_lines = _shift_heading_levels(body, heading_delta)
operations.append(f"shift_headings:{heading_delta}")
provenance.append(
OperationProvenance(
operation="shift_headings",
source_path=source_path,
metadata={"delta": heading_delta, "affected_lines": affected_lines},
)
)
if extract_selector:
document_text = _join_frontmatter(frontmatter, body) if frontmatter else body
@@ -84,24 +137,71 @@ def transform_markdown(
body = "\n\n".join(extract_document(document, extract_selector))
frontmatter = {}
operations.append(f"extract:{extract_selector}")
provenance.append(
OperationProvenance(
operation="extract",
source_path=source_path,
metadata={"selector": extract_selector},
)
)
if strip_frontmatter:
frontmatter = {}
operations.append("strip_frontmatter")
provenance.append(
OperationProvenance(
operation="strip_frontmatter",
source_path=source_path,
)
)
return TransformResult(markdown=_join_frontmatter(frontmatter, body), operations=operations)
return TransformResult(
markdown=_join_frontmatter(frontmatter, body),
operations=operations,
provenance=provenance,
)
def shift_heading_levels(markdown: str, delta: int) -> str:
"""Shift ATX heading levels by delta while clamping to levels 1 through 6."""
def replace(match: re.Match[str]) -> str:
shifted, _affected_lines = _shift_heading_levels(markdown, delta)
return shifted
def _shift_heading_levels(markdown: str, delta: int) -> tuple[str, list[int]]:
ignored_lines = _code_line_numbers(markdown)
affected_lines: list[int] = []
rendered_lines: list[str] = []
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
if line_number in ignored_lines:
rendered_lines.append(line)
continue
line_body = line.rstrip("\r\n")
line_ending = line[len(line_body) :]
match = _HEADING_RE.match(line_body)
if not match:
rendered_lines.append(line)
continue
marks = match.group(1)
suffix = match.group(2)
level = min(max(len(marks) + delta, 1), 6)
return f"{'#' * level}{suffix}"
rendered_lines.append(f"{'#' * level}{suffix}{line_ending}")
affected_lines.append(line_number)
return _HEADING_RE.sub(replace, markdown)
return "".join(rendered_lines), affected_lines
def _code_line_numbers(markdown: str) -> set[int]:
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
ignored_lines: set[int] = set()
for token in parser.parse(markdown):
if token.type not in {"fence", "code_block"} or not token.map:
continue
start, end = token.map
ignored_lines.update(range(start + 1, end + 1))
return ignored_lines
def compose_files(
@@ -154,18 +254,22 @@ def resolve_includes(
root = Path(base_dir).resolve()
stack = [Path(current_path).resolve()] if current_path else []
included: list[Path] = []
provenance: list[OperationProvenance] = []
resolved = _resolve_include_text(
markdown,
root=root,
current_dir=Path(current_path).resolve().parent if current_path else root,
source_path=Path(current_path).resolve() if current_path else None,
stack=stack,
included=included,
provenance=provenance,
depth=0,
max_depth=max_depth,
)
return IncludeResult(
markdown=resolved,
included_paths=[str(path) for path in included],
provenance=provenance,
)
@@ -174,34 +278,73 @@ def _resolve_include_text(
*,
root: Path,
current_dir: Path,
source_path: Path | None,
stack: list[Path],
included: list[Path],
provenance: list[OperationProvenance],
depth: int,
max_depth: int,
) -> str:
if depth > max_depth:
raise IncludeError(f"Include depth exceeded max_depth={max_depth}")
def replace_comment(match: re.Match[str]) -> str:
attrs = _parse_include_attrs(match.group("attrs"))
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
ignored_lines = _code_line_numbers(markdown)
rendered_lines: list[str] = []
def replace_brace(match: re.Match[str]) -> str:
attrs = {"path": match.group("path").strip()}
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
if line_number in ignored_lines:
rendered_lines.append(line)
continue
markdown = _COMMENT_INCLUDE_RE.sub(replace_comment, markdown)
return _BRACE_INCLUDE_RE.sub(replace_brace, markdown)
def replace_comment(match: re.Match[str]) -> str:
attrs = _parse_include_attrs(match.group("attrs"))
return _render_include(
attrs,
root,
current_dir,
source_path,
stack,
included,
provenance,
depth,
max_depth,
marker_line=line_number,
)
def replace_brace(match: re.Match[str]) -> str:
attrs = {"path": match.group("path").strip()}
return _render_include(
attrs,
root,
current_dir,
source_path,
stack,
included,
provenance,
depth,
max_depth,
marker_line=line_number,
)
line = _COMMENT_INCLUDE_RE.sub(replace_comment, line)
line = _BRACE_INCLUDE_RE.sub(replace_brace, line)
rendered_lines.append(line)
return "".join(rendered_lines)
def _render_include(
attrs: dict[str, str],
root: Path,
current_dir: Path,
source_path: Path | None,
stack: list[Path],
included: list[Path],
provenance: list[OperationProvenance],
depth: int,
max_depth: int,
*,
marker_line: int,
) -> str:
raw_path = attrs.get("path")
if not raw_path:
@@ -228,12 +371,33 @@ def _render_include(
body = shift_heading_levels(body, heading_delta)
included.append(include_path)
provenance.append(
OperationProvenance(
operation="include",
source_path=str(source_path) if source_path else None,
line_start=marker_line,
line_end=marker_line,
target_path=str(include_path),
dependencies=[str(include_path)],
metadata={
key: value
for key, value in {
"selector": selector,
"heading_delta": heading_delta if heading_delta else None,
"include_frontmatter": attrs.get("include_frontmatter"),
}.items()
if value is not None
},
)
)
return _resolve_include_text(
body.strip(),
root=root,
current_dir=include_path.parent,
source_path=include_path,
stack=stack + [include_path],
included=included,
provenance=provenance,
depth=depth + 1,
max_depth=max_depth,
)

View File

@@ -0,0 +1,27 @@
"""Deterministic fenced-block processor registry."""
from markitect_tool.processor.engine import (
FencedProcessorBlock,
ProcessorContext,
ProcessorOutputFile,
ProcessorRegistry,
ProcessorRequest,
ProcessorResult,
ProcessorRun,
default_processor_registry,
discover_fenced_processors,
run_fenced_processors,
)
__all__ = [
"FencedProcessorBlock",
"ProcessorContext",
"ProcessorOutputFile",
"ProcessorRegistry",
"ProcessorRequest",
"ProcessorResult",
"ProcessorRun",
"default_processor_registry",
"discover_fenced_processors",
"run_fenced_processors",
]

View File

@@ -0,0 +1,374 @@
"""Processor API for deterministic fenced-block workflows."""
from __future__ import annotations
import hashlib
import re
import shlex
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Callable
from markdown_it import MarkdownIt
from markitect_tool.diagnostics import Diagnostic, SourceLocation
from markitect_tool.ops import OperationProvenance
from markitect_tool.reference import (
ReferenceContext,
ReferenceResolutionError,
resolve_reference,
)
ProcessorCallable = Callable[["ProcessorRequest"], "ProcessorResult"]
@dataclass(frozen=True)
class FencedProcessorBlock:
"""A fenced Markdown block that opted into processor handling."""
processor: str
content: str
unit_id: str
attrs: dict[str, str]
language: str | None = None
source_path: str | None = None
line_start: int | None = None
line_end: int | None = None
content_hash: str = ""
def to_dict(self) -> dict[str, Any]:
return {key: value for key, value in asdict(self).items() if value not in (None, {}, "")}
@dataclass(frozen=True)
class ProcessorContext:
"""Execution context passed to deterministic processors."""
root: Path = Path(".")
current_path: Path | None = None
namespaces: dict[str, str] = field(default_factory=dict)
variables: dict[str, Any] = field(default_factory=dict)
policy: dict[str, Any] = field(default_factory=dict)
def reference_context(self) -> ReferenceContext:
return ReferenceContext(
root=self.root,
current_path=self.current_path,
namespaces=self.namespaces,
)
def to_dict(self) -> dict[str, Any]:
data = {
"root": str(self.root),
"current_path": str(self.current_path) if self.current_path else None,
"namespaces": self.namespaces,
"variables": self.variables,
"policy": self.policy,
}
return {key: value for key, value in data.items() if value not in (None, {}, "")}
@dataclass(frozen=True)
class ProcessorRequest:
"""One processor invocation."""
block: FencedProcessorBlock
context: ProcessorContext
@dataclass(frozen=True)
class ProcessorOutputFile:
"""A generated file requested by a processor."""
path: str
content: str
def to_dict(self) -> dict[str, Any]:
return asdict(self)
@dataclass(frozen=True)
class ProcessorResult:
"""Deterministic processor result envelope."""
content: str | None = None
files: list[ProcessorOutputFile] = field(default_factory=list)
diagnostics: list[Diagnostic] = field(default_factory=list)
dependencies: list[str] = field(default_factory=list)
provenance: list[OperationProvenance] = field(default_factory=list)
@property
def valid(self) -> bool:
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
def to_dict(self) -> dict[str, Any]:
data = {
"valid": self.valid,
"content": self.content,
"files": [file.to_dict() for file in self.files],
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
"dependencies": self.dependencies,
"provenance": [event.to_dict() for event in self.provenance],
}
return {key: value for key, value in data.items() if value not in (None, [], {})}
@dataclass(frozen=True)
class ProcessorRun:
"""Results from running all processor blocks in a document."""
source_path: str | None
blocks: list[FencedProcessorBlock]
results: list[ProcessorResult]
@property
def valid(self) -> bool:
return all(result.valid for result in self.results)
def to_dict(self) -> dict[str, Any]:
return {
"valid": self.valid,
"source_path": self.source_path,
"count": len(self.results),
"blocks": [block.to_dict() for block in self.blocks],
"results": [result.to_dict() for result in self.results],
}
class ProcessorRegistry:
"""Explicit registry for deterministic fenced-block processors."""
def __init__(self) -> None:
self._processors: dict[str, ProcessorCallable] = {}
def register(self, name: str, processor: ProcessorCallable) -> None:
key = _slug(name)
if not key:
raise ValueError("Processor name cannot be empty")
self._processors[key] = processor
def names(self) -> list[str]:
return sorted(self._processors)
def run(self, request: ProcessorRequest) -> ProcessorResult:
processor = self._processors.get(_slug(request.block.processor))
if processor is None:
return ProcessorResult(
diagnostics=[
Diagnostic(
severity="error",
code="processor.unknown",
message=f"Unknown processor `{request.block.processor}`",
source=SourceLocation(
path=request.block.source_path,
line=request.block.line_start,
),
)
]
)
return processor(request)
def default_processor_registry() -> ProcessorRegistry:
"""Create the default deterministic processor registry."""
registry = ProcessorRegistry()
registry.register("identity", _identity_processor)
registry.register("uppercase", _uppercase_processor)
registry.register("include", _include_processor)
return registry
def discover_fenced_processors(
markdown: str,
*,
source_path: str | Path | None = None,
) -> list[FencedProcessorBlock]:
"""Discover fenced blocks that explicitly opt into processor handling."""
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
blocks: list[FencedProcessorBlock] = []
used_ids: dict[str, int] = {}
for index, token in enumerate(parser.parse(markdown)):
if token.type != "fence":
continue
attrs = _parse_fence_info(token.info)
processor = _processor_name(attrs)
if not processor:
continue
unit_id = _dedupe_id(_slug(attrs.get("id") or f"{processor}-{index}"), used_ids)
line_start = token.map[0] + 1 if token.map else None
line_end = token.map[1] if token.map else None
blocks.append(
FencedProcessorBlock(
processor=processor,
content=token.content,
unit_id=unit_id,
attrs={
key: value
for key, value in attrs.items()
if key not in {"id", "language", "processor"}
},
language=attrs.get("language"),
source_path=str(source_path) if source_path else None,
line_start=line_start,
line_end=line_end,
content_hash=_hash_text(token.content),
)
)
return blocks
def run_fenced_processors(
markdown: str,
*,
context: ProcessorContext,
registry: ProcessorRegistry | None = None,
source_path: str | Path | None = None,
) -> ProcessorRun:
"""Run all processor-marked fenced blocks in document order."""
active_registry = registry or default_processor_registry()
blocks = discover_fenced_processors(markdown, source_path=source_path or context.current_path)
results = [
active_registry.run(ProcessorRequest(block=block, context=context))
for block in blocks
]
return ProcessorRun(
source_path=str(source_path or context.current_path) if source_path or context.current_path else None,
blocks=blocks,
results=results,
)
def _identity_processor(request: ProcessorRequest) -> ProcessorResult:
return ProcessorResult(
content=request.block.content,
provenance=[
OperationProvenance(
operation="processor.identity",
source_path=request.block.source_path,
line_start=request.block.line_start,
line_end=request.block.line_end,
metadata={"unit_id": request.block.unit_id},
)
],
)
def _uppercase_processor(request: ProcessorRequest) -> ProcessorResult:
return ProcessorResult(
content=request.block.content.upper(),
provenance=[
OperationProvenance(
operation="processor.uppercase",
source_path=request.block.source_path,
line_start=request.block.line_start,
line_end=request.block.line_end,
metadata={"unit_id": request.block.unit_id},
)
],
)
def _include_processor(request: ProcessorRequest) -> ProcessorResult:
reference = request.block.attrs.get("ref")
if not reference:
return ProcessorResult(
diagnostics=[
Diagnostic(
severity="error",
code="processor.include.missing_ref",
message="Include processor requires a `ref` attribute",
source=SourceLocation(
path=request.block.source_path,
line=request.block.line_start,
),
)
]
)
try:
resolution = resolve_reference(reference, context=request.context.reference_context())
except ReferenceResolutionError as exc:
return ProcessorResult(
diagnostics=[
Diagnostic(
severity="error",
code="processor.include.reference_error",
message=str(exc),
source=SourceLocation(
path=request.block.source_path,
line=request.block.line_start,
),
)
]
)
content = "\n\n".join(unit.text for unit in resolution.units)
return ProcessorResult(
content=content,
dependencies=[resolution.target_path],
provenance=[
OperationProvenance(
operation="processor.include",
source_path=request.block.source_path,
line_start=request.block.line_start,
line_end=request.block.line_end,
target_path=resolution.target_path,
dependencies=[resolution.target_path],
metadata={"ref": reference, "unit_ids": [unit.unit_id for unit in resolution.units]},
)
],
)
def _processor_name(attrs: dict[str, str]) -> str | None:
if "processor" in attrs:
return attrs["processor"]
language = attrs.get("language", "")
if language.startswith("mkt-"):
return language.removeprefix("mkt-")
if language == "mkt" and "type" in attrs:
return attrs["type"]
return None
def _parse_fence_info(info: str) -> dict[str, str]:
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
if not match:
return {"language": info.strip()} if info.strip() else {}
attrs = _parse_attrs(match.group("attrs") or "")
language = match.group("language")
if language:
attrs["language"] = language
return attrs
def _parse_attrs(raw: str) -> dict[str, str]:
attrs: dict[str, str] = {}
for part in shlex.split(raw):
if part.startswith("#") and len(part) > 1:
attrs["id"] = part[1:]
continue
if "=" not in part:
attrs[part] = "true"
continue
key, value = part.split("=", 1)
attrs[key.strip()] = value.strip()
return attrs
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
count = used_ids.get(unit_id, 0) + 1
used_ids[unit_id] = count
return unit_id if count == 1 else f"{unit_id}-{count}"
def _slug(value: str) -> str:
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
slug = re.sub(r"-+", "-", slug).strip("-")
return slug
def _hash_text(text: str) -> str:
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()

View File

@@ -0,0 +1,25 @@
"""Namespaced content reference resolution for Markdown artifacts."""
from markitect_tool.reference.engine import (
ContentUnit,
ReferenceAddress,
ReferenceContext,
ReferenceResolution,
ReferenceResolutionError,
SourceSpan,
load_namespaces,
parse_reference,
resolve_reference,
)
__all__ = [
"ContentUnit",
"ReferenceAddress",
"ReferenceContext",
"ReferenceResolution",
"ReferenceResolutionError",
"SourceSpan",
"load_namespaces",
"parse_reference",
"resolve_reference",
]

View File

@@ -0,0 +1,626 @@
"""Reference parsing and resolution for Markdown content units."""
from __future__ import annotations
import hashlib
import re
import shlex
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any
from markdown_it import MarkdownIt
from markitect_tool.core import ContentBlock, Document, Heading, Section, parse_markdown
from markitect_tool.query import InvalidQueryError, QueryMatch, query_document
class ReferenceResolutionError(ValueError):
"""Raised when a content reference cannot be resolved."""
@dataclass(frozen=True)
class ReferenceAddress:
"""Parsed content reference address.
Syntax is intentionally compact and Markdown-friendly:
- ``path/to/file.md``
- ``std:clauses/payment.md``
- ``std:clauses/payment.md#section:terms``
- ``std:clauses/payment.md::sections[heading=Terms]``
- ``#intro`` for a fragment in the current document
"""
raw: str
namespace: str | None = None
address: str = ""
fragment: str | None = None
selector: str | None = None
def to_dict(self) -> dict[str, Any]:
return {
key: value
for key, value in asdict(self).items()
if value is not None and value != ""
}
@dataclass(frozen=True)
class ReferenceContext:
"""Inputs used to resolve namespaced and relative content references."""
root: Path = Path(".")
current_path: Path | None = None
namespaces: dict[str, str] = field(default_factory=dict)
@classmethod
def from_document(
cls,
document: Document,
*,
root: str | Path = ".",
current_path: str | Path | None = None,
) -> "ReferenceContext":
"""Build a reference context from document frontmatter."""
source_path = current_path or document.source_path
return cls(
root=Path(root),
current_path=Path(source_path) if source_path else None,
namespaces=load_namespaces(document.frontmatter),
)
def to_dict(self) -> dict[str, Any]:
data = {
"root": str(self.root),
"current_path": str(self.current_path) if self.current_path else None,
"namespaces": self.namespaces,
}
return {key: value for key, value in data.items() if value is not None}
@dataclass(frozen=True)
class SourceSpan:
"""Line span for a resolved unit in its source file."""
line_start: int | None = None
line_end: int | None = None
def to_dict(self) -> dict[str, Any]:
return {key: value for key, value in asdict(self).items() if value is not None}
@dataclass(frozen=True)
class ContentUnit:
"""One addressable content unit resolved from Markdown."""
kind: str
unit_id: str
text: str
source_path: str
span: SourceSpan | None = None
name: str | None = None
content_hash: str = ""
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
data = {
"kind": self.kind,
"unit_id": self.unit_id,
"name": self.name,
"source_path": self.source_path,
"span": self.span.to_dict() if self.span else None,
"content_hash": self.content_hash,
"metadata": self.metadata or None,
"text": self.text,
}
return {key: value for key, value in data.items() if value is not None}
@dataclass(frozen=True)
class ReferenceResolution:
"""Resolved content reference and its dependency edge."""
reference: ReferenceAddress
source_path: str
target_path: str
units: list[ContentUnit]
def to_dict(self) -> dict[str, Any]:
return {
"reference": self.reference.to_dict(),
"source_path": self.source_path,
"target_path": self.target_path,
"count": len(self.units),
"units": [unit.to_dict() for unit in self.units],
}
_NAMESPACE_RE = re.compile(r"^(?P<namespace>[A-Za-z][A-Za-z0-9_.-]*):(?P<address>.*)$")
_HEADING_ID_RE = re.compile(r"^(?P<title>.*?)(?:\s+\{#(?P<id>[A-Za-z0-9_.:-]+)\})?$")
_REGION_OPEN_RE = re.compile(r"<!--\s*mkt:region\s+(?P<attrs>.*?)\s*-->")
_REGION_CLOSE_RE = re.compile(r"<!--\s*/mkt:region\s*-->")
_FENCE_ATTRS_RE = re.compile(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$")
def parse_reference(reference: str) -> ReferenceAddress:
"""Parse a compact Markitect content reference."""
raw = reference.strip()
if not raw:
raise ReferenceResolutionError("Reference cannot be empty")
selector: str | None = None
base = raw
if "::" in base:
base, selector = base.split("::", 1)
selector = selector.strip()
if not selector:
raise ReferenceResolutionError(f"Reference selector is empty in `{reference}`")
fragment: str | None = None
if "#" in base:
base, fragment = base.split("#", 1)
fragment = fragment.strip()
if not fragment:
raise ReferenceResolutionError(f"Reference fragment is empty in `{reference}`")
namespace: str | None = None
address = base.strip()
match = _NAMESPACE_RE.match(address)
if match and "/" not in match.group("namespace") and "\\" not in match.group("namespace"):
namespace = match.group("namespace")
address = match.group("address").strip()
return ReferenceAddress(
raw=raw,
namespace=namespace,
address=address,
fragment=fragment,
selector=selector,
)
def load_namespaces(frontmatter: dict[str, Any]) -> dict[str, str]:
"""Load namespace mappings from Markdown frontmatter."""
raw_namespaces = frontmatter.get("namespaces", {})
if raw_namespaces is None:
return {}
if not isinstance(raw_namespaces, dict):
raise ReferenceResolutionError("Frontmatter `namespaces` must be a mapping")
namespaces: dict[str, str] = {}
for raw_key, raw_value in raw_namespaces.items():
key = str(raw_key).strip().rstrip(":")
if not key:
raise ReferenceResolutionError("Namespace keys cannot be empty")
if not _NAMESPACE_RE.match(f"{key}:"):
raise ReferenceResolutionError(f"Invalid namespace key `{raw_key}`")
if not isinstance(raw_value, str):
raise ReferenceResolutionError(f"Namespace `{key}` must map to a string path")
value = raw_value.strip()
if not value:
raise ReferenceResolutionError(f"Namespace `{key}` cannot map to an empty path")
namespaces[key] = value
return namespaces
def resolve_reference(
reference: str | ReferenceAddress,
*,
context: ReferenceContext,
) -> ReferenceResolution:
"""Resolve a content reference to one or more content units."""
address = parse_reference(reference) if isinstance(reference, str) else reference
root = context.root.resolve()
source_path = context.current_path.resolve() if context.current_path else root
target_path = _resolve_target_path(address, context, root, source_path)
if not target_path.exists() or not target_path.is_file():
raise ReferenceResolutionError(f"Referenced file not found: {target_path}")
markdown = target_path.read_text(encoding="utf-8")
document = parse_markdown(markdown, source_path=str(target_path))
if address.selector and address.fragment:
raise ReferenceResolutionError("Reference cannot use both fragment and selector")
if address.selector:
units = _units_from_selector(document, address.selector, target_path)
elif address.fragment:
units = _units_from_fragment(document, address.fragment, target_path, markdown)
else:
units = [_document_unit(document, target_path, markdown)]
if not units:
raise ReferenceResolutionError(f"Reference `{address.raw}` did not match any content units")
return ReferenceResolution(
reference=address,
source_path=str(source_path),
target_path=str(target_path),
units=units,
)
def _resolve_target_path(
address: ReferenceAddress,
context: ReferenceContext,
root: Path,
source_path: Path,
) -> Path:
if address.namespace:
if address.namespace not in context.namespaces:
raise ReferenceResolutionError(f"Unknown namespace `{address.namespace}`")
namespace_target = _path_from_namespace(context.namespaces[address.namespace], root)
candidate = namespace_target / address.address if namespace_target.is_dir() else namespace_target
elif address.address:
base_dir = source_path.parent if source_path.is_file() else root
candidate = Path(address.address)
candidate = candidate if candidate.is_absolute() else base_dir / candidate
elif context.current_path:
candidate = context.current_path
else:
raise ReferenceResolutionError("Pathless references require a current document")
resolved = candidate.resolve()
try:
resolved.relative_to(root)
except ValueError as exc:
raise ReferenceResolutionError(f"Reference escapes root: {address.raw}") from exc
return resolved
def _path_from_namespace(raw_path: str, root: Path) -> Path:
path = Path(raw_path)
if not path.is_absolute():
path = root / path
return path.resolve()
def _units_from_selector(
document: Document,
selector: str,
target_path: Path,
) -> list[ContentUnit]:
try:
matches = query_document(document, selector)
except InvalidQueryError as exc:
raise ReferenceResolutionError(str(exc)) from exc
return [_unit_from_query_match(match, target_path) for match in matches]
def _units_from_fragment(
document: Document,
fragment: str,
target_path: Path,
markdown: str,
) -> list[ContentUnit]:
kind, _, value = fragment.partition(":")
if not value:
kind, value = "id", kind
lookup = _slug(value)
if kind == "document":
return [_document_unit(document, target_path, markdown)]
if kind == "id":
for units in [
_section_units(document, target_path),
_region_units(markdown, target_path),
_fenced_block_units(markdown, target_path),
_heading_units(document, target_path),
]:
matches = [
unit for unit in units if unit.unit_id == lookup or _slug(unit.name or "") == lookup
]
if matches:
return matches
return []
if kind in {"id", "section"}:
sections = _section_units(document, target_path)
return [unit for unit in sections if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
if kind == "heading":
headings = _heading_units(document, target_path)
return [unit for unit in headings if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
if kind == "block":
return _block_fragment_units(document, target_path, value)
if kind == "region":
return [unit for unit in _region_units(markdown, target_path) if unit.unit_id == lookup]
if kind == "fence":
return [unit for unit in _fenced_block_units(markdown, target_path) if unit.unit_id == lookup]
if kind == "tag":
return [
unit
for unit in _region_units(markdown, target_path) + _fenced_block_units(markdown, target_path)
if lookup in {_slug(tag) for tag in unit.metadata.get("tags", [])}
]
if kind == "line":
return _line_range_units(markdown, target_path, value)
raise ReferenceResolutionError(f"Unsupported reference fragment kind `{kind}`")
def _document_unit(document: Document, target_path: Path, markdown: str) -> ContentUnit:
unit_id = _slug(str(document.frontmatter.get("id") or target_path.stem))
return _content_unit(
kind="document",
unit_id=unit_id,
text=markdown,
source_path=target_path,
span=SourceSpan(1, len(markdown.splitlines())),
name=str(document.frontmatter.get("title") or target_path.stem),
metadata={"frontmatter": document.frontmatter},
)
def _unit_from_query_match(match: QueryMatch, target_path: Path) -> ContentUnit:
unit_id = _slug(match.path.replace("$.", "").replace("[", "-").replace("]", ""))
name = match.text.splitlines()[0].lstrip("# ").strip() if match.text else match.kind
return _content_unit(
kind=match.kind,
unit_id=unit_id,
text=match.text if match.text is not None else str(match.value),
source_path=target_path,
span=SourceSpan(match.line, None),
name=name,
metadata={"query_path": match.path, "value": match.value},
)
def _section_units(document: Document, target_path: Path) -> list[ContentUnit]:
used_ids: dict[str, int] = {}
return [
_section_unit(section, target_path, used_ids)
for section in document.sections
]
def _section_unit(
section: Section,
target_path: Path,
used_ids: dict[str, int],
) -> ContentUnit:
title, explicit_id = _heading_title_and_id(section.heading)
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
line_end = section.blocks[-1].line_end if section.blocks else section.heading.line
lines = [f"{'#' * section.heading.level} {section.heading.text}"]
for block in section.blocks:
if block.text:
lines.extend(["", block.text])
return _content_unit(
kind="section",
unit_id=unit_id,
text="\n".join(lines).strip(),
source_path=target_path,
span=SourceSpan(section.heading.line, line_end),
name=title,
metadata={"heading_level": section.heading.level},
)
def _heading_units(document: Document, target_path: Path) -> list[ContentUnit]:
used_ids: dict[str, int] = {}
units: list[ContentUnit] = []
for heading in document.headings:
title, explicit_id = _heading_title_and_id(heading)
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
units.append(
_content_unit(
kind="heading",
unit_id=unit_id,
text=f"{'#' * heading.level} {heading.text}",
source_path=target_path,
span=SourceSpan(heading.line, heading.line),
name=title,
metadata={"heading_level": heading.level},
)
)
return units
def _block_fragment_units(
document: Document,
target_path: Path,
value: str,
) -> list[ContentUnit]:
blocks = _block_units(document.blocks, target_path)
if value.isdigit():
index = int(value)
return [blocks[index]] if 0 <= index < len(blocks) else []
lookup = _slug(value)
return [unit for unit in blocks if unit.unit_id == lookup]
def _block_units(blocks: list[ContentBlock], target_path: Path) -> list[ContentUnit]:
used_ids: dict[str, int] = {}
units: list[ContentUnit] = []
for index, block in enumerate(blocks):
base_id = f"{block.type}-{block.line_start or index}"
units.append(
_content_unit(
kind=block.type,
unit_id=_dedupe_id(_slug(base_id), used_ids),
text=block.text,
source_path=target_path,
span=SourceSpan(block.line_start, block.line_end),
name=block.type,
metadata={"block_index": index},
)
)
return units
def _region_units(markdown: str, target_path: Path) -> list[ContentUnit]:
lines = markdown.splitlines()
units: list[ContentUnit] = []
open_region: tuple[int, str, list[str]] | None = None
for index, line in enumerate(lines, start=1):
open_match = _REGION_OPEN_RE.search(line)
close_match = _REGION_CLOSE_RE.search(line)
if open_match and open_region is not None:
raise ReferenceResolutionError("Nested mkt:region blocks are not supported")
if close_match:
if open_region is None:
raise ReferenceResolutionError("Region close marker has no matching open marker")
start_line, region_id, tags = open_region
content_lines = lines[start_line:index - 1]
units.append(
_content_unit(
kind="region",
unit_id=_slug(region_id),
text="\n".join(content_lines).strip(),
source_path=target_path,
span=SourceSpan(start_line, index),
name=region_id,
metadata={"tags": tags},
)
)
open_region = None
continue
if open_match:
attrs = _parse_attrs(open_match.group("attrs"))
region_id = attrs.get("id")
if not region_id:
raise ReferenceResolutionError("Region marker requires an id attribute")
open_region = (index, region_id, _tags_from_attrs(attrs))
if open_region is not None:
raise ReferenceResolutionError("Region open marker has no matching close marker")
return units
def _fenced_block_units(markdown: str, target_path: Path) -> list[ContentUnit]:
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
units: list[ContentUnit] = []
used_ids: dict[str, int] = {}
for index, token in enumerate(parser.parse(markdown)):
if token.type != "fence":
continue
attrs = _parse_fence_info(token.info)
unit_id = attrs.get("id")
if not unit_id:
continue
line_start = token.map[0] + 1 if token.map else None
line_end = token.map[1] if token.map else None
units.append(
_content_unit(
kind="fenced_block",
unit_id=_dedupe_id(_slug(unit_id), used_ids),
text=token.content,
source_path=target_path,
span=SourceSpan(line_start, line_end),
name=unit_id,
metadata={
"language": attrs.get("language"),
"tags": _tags_from_attrs(attrs),
"attrs": {
key: value
for key, value in attrs.items()
if key not in {"id", "language", "tag", "tags"}
},
"block_index": index,
},
)
)
return units
def _line_range_units(markdown: str, target_path: Path, value: str) -> list[ContentUnit]:
match = re.match(r"^(?P<start>\d+)(?:-(?P<end>\d+))?$", value)
if not match:
raise ReferenceResolutionError("Line fragments must use `line:start` or `line:start-end`")
start = int(match.group("start"))
end = int(match.group("end") or start)
lines = markdown.splitlines()
if start < 1 or end < start or end > len(lines):
return []
text = "\n".join(lines[start - 1:end])
return [
_content_unit(
kind="line_range",
unit_id=f"line-{start}-{end}",
text=text,
source_path=target_path,
span=SourceSpan(start, end),
name=f"lines {start}-{end}",
metadata={},
)
]
def _parse_fence_info(info: str) -> dict[str, str]:
match = _FENCE_ATTRS_RE.match(info.strip())
if not match:
return {"language": info.strip()} if info.strip() else {}
attrs = _parse_attrs(match.group("attrs") or "")
language = match.group("language")
if language:
attrs["language"] = language
if "id" not in attrs and attrs:
for key in list(attrs):
if key.startswith("#"):
attrs["id"] = key[1:]
del attrs[key]
break
return attrs
def _parse_attrs(raw: str) -> dict[str, str]:
attrs: dict[str, str] = {}
for part in shlex.split(raw):
if part.startswith("#") and len(part) > 1:
attrs["id"] = part[1:]
continue
if "=" not in part:
attrs[part] = "true"
continue
key, value = part.split("=", 1)
attrs[key.strip()] = value.strip()
return attrs
def _tags_from_attrs(attrs: dict[str, str]) -> list[str]:
raw = attrs.get("tags") or attrs.get("tag") or ""
return [tag.strip() for tag in re.split(r"[, ]+", raw) if tag.strip()]
def _content_unit(
*,
kind: str,
unit_id: str,
text: str,
source_path: Path,
span: SourceSpan | None,
name: str | None,
metadata: dict[str, Any] | None = None,
) -> ContentUnit:
return ContentUnit(
kind=kind,
unit_id=unit_id,
text=text,
source_path=str(source_path),
span=span,
name=name,
content_hash="sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest(),
metadata=metadata or {},
)
def _heading_title_and_id(heading: Heading) -> tuple[str, str | None]:
match = _HEADING_ID_RE.match(heading.text.strip())
if not match:
return heading.text.strip(), None
return match.group("title").strip(), match.group("id")
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
count = used_ids.get(unit_id, 0) + 1
used_ids[unit_id] = count
return unit_id if count == 1 else f"{unit_id}-{count}"
def _slug(value: str) -> str:
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
slug = re.sub(r"-+", "-", slug).strip("-")
return slug or "unit"

View File

@@ -0,0 +1,106 @@
from pathlib import Path
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.content_class import load_content_classes
def test_c3_linearization_for_diamond_inheritance():
registry = load_content_classes(
{
"classes": {
"base": {"slots": {"sections": ["Overview"]}},
"left": {"extends": ["base"], "slots": {"sections": ["Left"]}},
"right": {"extends": ["base"], "slots": {"sections": ["Right"]}},
"leaf": {"extends": ["left", "right"], "slots": {"title": "Leaf"}},
}
}
)
assert registry.linearize("leaf") == ["leaf", "left", "right", "base"]
def test_compose_merges_slots_with_explicit_policies():
registry = load_content_classes(
{
"classes": {
"base": {
"slots": {
"sections": ["Overview"],
"assertions": {"tone": "plain", "depth": "short"},
}
},
"market": {
"extends": ["base"],
"slots": {
"sections": ["Pricing"],
"assertions": {"depth": "detailed"},
},
"merge_policies": {
"sections": "append",
"assertions": "deep_merge",
},
},
"instance": {
"extends": ["market"],
"slots": {"sections": ["Risks"]},
"merge_policies": {"sections": "append"},
},
}
}
)
result = registry.compose("instance")
assert result.valid
assert result.slots["sections"] == ["Overview", "Pricing", "Risks"]
assert result.slots["assertions"] == {"tone": "plain", "depth": "detailed"}
def test_compose_reports_error_on_conflict():
registry = load_content_classes(
{
"classes": {
"base": {"slots": {"owner": "A"}},
"instance": {
"extends": ["base"],
"slots": {"owner": "B"},
"merge_policies": {"owner": "error_on_conflict"},
},
}
}
)
result = registry.compose("instance")
assert not result.valid
assert result.diagnostics[0].code == "content_class.merge_conflict"
def test_mkt_class_resolve_outputs_text(tmp_path: Path):
class_file = tmp_path / "classes.yaml"
class_file.write_text(
"""classes:
base:
slots:
sections:
- Overview
instance:
extends:
- base
slots:
sections:
- Risks
merge_policies:
sections: append
""",
encoding="utf-8",
)
result = CliRunner().invoke(main, ["class", "resolve", str(class_file), "instance"])
assert result.exit_code == 0
assert "linearization: instance -> base" in result.output
assert "Overview" in result.output
assert "Risks" in result.output

View File

@@ -0,0 +1,93 @@
from pathlib import Path
import pytest
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.explode import (
EXPLODE_MANIFEST_NAME,
ExplodeError,
explode_markdown_file,
implode_markdown_directory,
)
ROUNDTRIP_DOC = """---
title: Explode Example
---
Opening text before the first heading.
# Intro
Intro body.
## Detail
Detail body.
# Later
Later body.
"""
def test_flat_explode_implode_roundtrips_exact_markdown(tmp_path: Path):
source = tmp_path / "source.md"
output_dir = tmp_path / "exploded"
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
result = explode_markdown_file(source, output_dir, variant="flat")
imploded = implode_markdown_directory(output_dir)
assert Path(result.manifest_path).name == EXPLODE_MANIFEST_NAME
assert (output_dir / "00-preamble.md").exists()
assert (output_dir / "sections" / "01-intro.md").exists()
assert imploded.markdown == ROUNDTRIP_DOC
assert imploded.current_hash == result.manifest.source_hash
def test_hierarchical_explode_places_child_sections_under_parent(tmp_path: Path):
source = tmp_path / "source.md"
output_dir = tmp_path / "exploded"
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
result = explode_markdown_file(source, output_dir, variant="hierarchical")
files = {Path(path).relative_to(output_dir).as_posix() for path in result.written_files}
assert "01-intro.md" in files
assert "01-intro/02-detail.md" in files
assert implode_markdown_directory(output_dir).markdown == ROUNDTRIP_DOC
def test_explode_rejects_non_empty_output_without_force(tmp_path: Path):
source = tmp_path / "source.md"
output_dir = tmp_path / "exploded"
output_dir.mkdir()
(output_dir / "existing.md").write_text("Existing", encoding="utf-8")
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
with pytest.raises(ExplodeError, match="not empty"):
explode_markdown_file(source, output_dir)
def test_mkt_explode_and_implode(tmp_path: Path):
source = tmp_path / "source.md"
output_dir = tmp_path / "exploded"
rebuilt = tmp_path / "rebuilt.md"
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
runner = CliRunner()
explode_result = runner.invoke(
main,
["explode", str(source), "--output-dir", str(output_dir), "--variant", "flat"],
)
implode_result = runner.invoke(
main,
["implode", str(output_dir), "--output", str(rebuilt)],
)
assert explode_result.exit_code == 0
assert "entries: 4" in explode_result.output
assert implode_result.exit_code == 0
assert rebuilt.read_text(encoding="utf-8") == ROUNDTRIP_DOC

View File

@@ -0,0 +1,91 @@
from pathlib import Path
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.literate import (
discover_code_chunks,
tangle_markdown,
weave_markdown,
write_tangle_files,
)
LITERATE_DOC = """# Literate Example
```python {#helpers}
def helper():
return "ready"
```
```python {#main tangle="src/app.py"}
<<helpers>>
def main():
return helper()
```
"""
def test_discover_code_chunks_with_references_and_targets():
chunks = discover_code_chunks(LITERATE_DOC, source_path="example.md")
assert [chunk.chunk_id for chunk in chunks] == ["helpers", "main"]
assert chunks[1].target_path == "src/app.py"
assert chunks[1].references == ["helpers"]
def test_tangle_expands_named_chunk_references():
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
assert result.valid
assert len(result.files) == 1
assert result.files[0].path == "src/app.py"
assert "def helper" in result.files[0].content
assert "<<helpers>>" not in result.files[0].content
assert result.provenance[0].operation == "literate.tangle"
def test_tangle_reports_missing_chunk_reference():
markdown = """```python {#main tangle="src/app.py"}
<<missing>>
```
"""
result = tangle_markdown(markdown, source_path="example.md")
assert not result.valid
assert result.diagnostics[0].code == "literate.missing_chunk"
def test_weave_appends_chunk_index():
result = weave_markdown(LITERATE_DOC, source_path="example.md")
assert "## Code Chunk Index" in result.markdown
assert "`main` -> `src/app.py`; refs: `helpers`" in result.markdown
def test_write_tangle_files(tmp_path: Path):
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
written = write_tangle_files(result, tmp_path)
assert written == [str(tmp_path / "src" / "app.py")]
assert "def main" in (tmp_path / "src" / "app.py").read_text(encoding="utf-8")
def test_mkt_tangle_and_weave(tmp_path: Path):
source = tmp_path / "literate.md"
output_dir = tmp_path / "out"
woven = tmp_path / "woven.md"
source.write_text(LITERATE_DOC, encoding="utf-8")
runner = CliRunner()
tangle_result = runner.invoke(main, ["tangle", str(source), "--output-dir", str(output_dir)])
weave_result = runner.invoke(main, ["weave", str(source), "--output", str(woven)])
assert tangle_result.exit_code == 0
assert "files: 1" in tangle_result.output
assert (output_dir / "src" / "app.py").exists()
assert weave_result.exit_code == 0
assert "## Code Chunk Index" in woven.read_text(encoding="utf-8")

View File

@@ -34,6 +34,27 @@ title: Original
assert "## Intro" in result.markdown
assert "### Detail" in result.markdown
assert result.operations == ["set_frontmatter", "shift_headings:1"]
assert [event.operation for event in result.provenance] == [
"set_frontmatter",
"shift_headings",
]
def test_transform_shifts_headings_without_touching_fenced_code():
markdown = """# Intro
```markdown
# Literal Heading
```
## Real Heading
"""
result = transform_markdown(markdown, heading_delta=1)
assert "```markdown\n# Literal Heading\n```" in result.markdown
assert "### Real Heading" in result.markdown
assert result.provenance[0].metadata["affected_lines"] == [1, 7]
def test_transform_extracts_selector_text():
@@ -104,6 +125,25 @@ def test_resolve_includes_supports_brace_shorthand(tmp_path: Path):
assert "Before" in result.markdown
assert "Included body." in result.markdown
assert "After" in result.markdown
assert result.provenance[0].operation == "include"
assert result.provenance[0].target_path == str(partial.resolve())
def test_resolve_includes_ignores_markers_inside_fenced_code(tmp_path: Path):
partial = tmp_path / "partial.md"
partial.write_text("Included body.", encoding="utf-8")
markdown = """```markdown
{{include:partial.md}}
```
{{include:partial.md}}
"""
result = resolve_includes(markdown, base_dir=tmp_path)
assert result.markdown.count("Included body.") == 1
assert "{{include:partial.md}}" in result.markdown
assert result.included_paths == [str(partial.resolve())]
def test_resolve_includes_rejects_cycles(tmp_path: Path):

View File

@@ -0,0 +1,105 @@
from pathlib import Path
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.core import parse_markdown
from markitect_tool.processor import (
ProcessorContext,
default_processor_registry,
discover_fenced_processors,
run_fenced_processors,
)
from markitect_tool.reference import load_namespaces
def test_discover_fenced_processors_from_language_prefix():
markdown = """# Doc
```mkt-uppercase {#shout}
hello
```
"""
blocks = discover_fenced_processors(markdown, source_path="doc.md")
assert len(blocks) == 1
assert blocks[0].processor == "uppercase"
assert blocks[0].unit_id == "shout"
assert blocks[0].line_start == 3
def test_default_registry_runs_uppercase_processor():
markdown = """```mkt-uppercase {#shout}
hello
```
"""
context = ProcessorContext()
run = run_fenced_processors(markdown, context=context)
assert run.valid
assert run.results[0].content == "HELLO\n"
assert run.results[0].provenance[0].operation == "processor.uppercase"
def test_include_processor_uses_reference_resolver(tmp_path: Path):
source = tmp_path / "doc.md"
partial = tmp_path / "partial.md"
source.write_text(
"""---
namespaces:
local: .
---
```mkt-include {#intro ref="local:partial.md#summary"}
```
""",
encoding="utf-8",
)
partial.write_text("# Partial\n\n## Summary\n\nIncluded summary.\n", encoding="utf-8")
document = parse_markdown(source.read_text(encoding="utf-8"), source_path=str(source))
context = ProcessorContext(
root=tmp_path,
current_path=source,
namespaces=load_namespaces(document.frontmatter),
)
run = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
assert run.valid
assert run.results[0].dependencies == [str(partial.resolve())]
assert "Included summary" in run.results[0].content
def test_unknown_processor_returns_diagnostic():
markdown = """```mkt-nope {#x}
content
```
"""
registry = default_processor_registry()
run = run_fenced_processors(markdown, context=ProcessorContext(), registry=registry)
assert not run.valid
assert run.results[0].diagnostics[0].code == "processor.unknown"
def test_mkt_process_outputs_text(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text(
"""# Doc
```mkt-uppercase {#shout}
hello
```
""",
encoding="utf-8",
)
result = CliRunner().invoke(main, ["process", str(source), "--root", str(tmp_path)])
assert result.exit_code == 0
assert "valid" in result.output
assert "uppercase shout" in result.output
assert "HELLO" in result.output

View File

@@ -0,0 +1,195 @@
from pathlib import Path
import pytest
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.core import parse_markdown
from markitect_tool.reference import (
ReferenceContext,
ReferenceResolutionError,
load_namespaces,
parse_reference,
resolve_reference,
)
def test_parse_reference_splits_namespace_fragment_and_selector():
address = parse_reference("std:clauses/payment.md#section:fees::blocks[type=code]")
assert address.namespace == "std"
assert address.address == "clauses/payment.md"
assert address.fragment == "section:fees"
assert address.selector == "blocks[type=code]"
def test_load_namespaces_accepts_optional_colon_suffix():
namespaces = load_namespaces({"namespaces": {"std:": "./standard", "src": "../src"}})
assert namespaces == {"std": "./standard", "src": "../src"}
def test_resolve_path_reference_returns_document_unit(tmp_path: Path):
context_file = tmp_path / "context.md"
target_file = tmp_path / "target.md"
context_file.write_text("# Context\n", encoding="utf-8")
target_file.write_text("---\nid: target-doc\ntitle: Target\n---\n\n# Target\n\nBody.", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
resolution = resolve_reference("target.md", context=context)
assert resolution.target_path == str(target_file.resolve())
assert len(resolution.units) == 1
assert resolution.units[0].kind == "document"
assert resolution.units[0].unit_id == "target-doc"
assert "# Target" in resolution.units[0].text
def test_resolve_namespace_reference_and_explicit_section_id(tmp_path: Path):
standard = tmp_path / "standard"
standard.mkdir()
context_file = tmp_path / "context.md"
clause_file = standard / "clauses.md"
context_file.write_text(
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
encoding="utf-8",
)
clause_file.write_text(
"# Clauses\n\n## Payment Terms {#payment-terms}\n\nPay within 30 days.\n",
encoding="utf-8",
)
document = parse_markdown(context_file.read_text(encoding="utf-8"), source_path=str(context_file))
context = ReferenceContext.from_document(document, root=tmp_path)
resolution = resolve_reference("std:clauses.md#section:payment-terms", context=context)
assert resolution.units[0].kind == "section"
assert resolution.units[0].unit_id == "payment-terms"
assert resolution.units[0].name == "Payment Terms"
assert "Pay within 30 days" in resolution.units[0].text
def test_resolve_selector_reference_uses_existing_query_engine(tmp_path: Path):
standard = tmp_path / "standard"
standard.mkdir()
context_file = tmp_path / "context.md"
source_file = standard / "clauses.md"
context_file.write_text(
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
encoding="utf-8",
)
source_file.write_text(
"# Clauses\n\n## Warranty\n\nWarranty text.\n\n## Liability\n\nLiability text.\n",
encoding="utf-8",
)
context = ReferenceContext.from_document(parse_markdown(context_file.read_text(encoding="utf-8"), str(context_file)), root=tmp_path)
resolution = resolve_reference("std:clauses.md::sections[heading=Warranty]", context=context)
assert [unit.kind for unit in resolution.units] == ["section"]
assert resolution.units[0].name == "Warranty"
assert "Liability" not in resolution.units[0].text
def test_resolve_pathless_fragment_uses_current_document(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text("# Context\n\n## Overview\n\nUseful local context.\n", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
resolution = resolve_reference("#overview", context=context)
assert resolution.target_path == str(context_file.resolve())
assert resolution.units[0].kind == "section"
assert resolution.units[0].unit_id == "overview"
assert "Useful local context" in resolution.units[0].text
def test_resolve_named_region_by_id_and_tag(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text(
"""# Context
<!-- mkt:region id="overview" tags="reuse summary" -->
Reusable region text.
<!-- /mkt:region -->
""",
encoding="utf-8",
)
context = ReferenceContext(root=tmp_path, current_path=context_file)
by_id = resolve_reference("#region:overview", context=context)
by_tag = resolve_reference("#tag:summary", context=context)
assert by_id.units[0].kind == "region"
assert by_id.units[0].text == "Reusable region text."
assert by_tag.units[0].unit_id == "overview"
def test_resolve_fenced_block_by_id(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text(
"""# Context
```python {#load-config tags="code setup" tangle="src/config.py"}
def load_config():
return {}
```
""",
encoding="utf-8",
)
context = ReferenceContext(root=tmp_path, current_path=context_file)
resolution = resolve_reference("#fence:load-config", context=context)
assert resolution.units[0].kind == "fenced_block"
assert resolution.units[0].unit_id == "load-config"
assert resolution.units[0].metadata["language"] == "python"
assert resolution.units[0].metadata["attrs"]["tangle"] == "src/config.py"
assert "def load_config" in resolution.units[0].text
def test_resolve_line_range_fragment(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text("# Context\n\nLine A\nLine B\nLine C\n", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
resolution = resolve_reference("#line:3-4", context=context)
assert resolution.units[0].kind == "line_range"
assert resolution.units[0].span.line_start == 3
assert resolution.units[0].text == "Line A\nLine B"
def test_resolve_rejects_unknown_namespace(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text("# Context\n", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
with pytest.raises(ReferenceResolutionError, match="Unknown namespace"):
resolve_reference("missing:doc.md", context=context)
def test_resolve_rejects_paths_outside_root(tmp_path: Path):
context_file = tmp_path / "context.md"
context_file.write_text("# Context\n", encoding="utf-8")
context = ReferenceContext(root=tmp_path, current_path=context_file)
with pytest.raises(ReferenceResolutionError, match="escapes root"):
resolve_reference("../outside.md", context=context)
def test_mkt_ref_resolve_outputs_text(tmp_path: Path):
context_file = tmp_path / "context.md"
target_file = tmp_path / "target.md"
context_file.write_text("# Context\n", encoding="utf-8")
target_file.write_text("# Target\n\n## Decision\n\nChosen.", encoding="utf-8")
result = CliRunner().invoke(
main,
["ref", "resolve", str(context_file), "target.md#decision", "--root", str(tmp_path)],
)
assert result.exit_code == 0
assert "1 unit(s)" in result.output
assert "section decision" in result.output
assert "Decision" in result.output

View File

@@ -0,0 +1,60 @@
from pathlib import Path
from markitect_tool.core import parse_markdown_file
from markitect_tool.explode import explode_markdown_file, implode_markdown_directory
from markitect_tool.ops import resolve_includes
from markitect_tool.processor import ProcessorContext, run_fenced_processors
from markitect_tool.reference import load_namespaces
from markitect_tool.literate import tangle_markdown
EXAMPLES = Path("examples/migration")
def test_migration_explode_example_roundtrips(tmp_path: Path):
source = EXAMPLES / "legacy-explode-source.md"
original = source.read_text(encoding="utf-8")
explode_markdown_file(source, tmp_path / "exploded", variant="hierarchical")
result = implode_markdown_directory(tmp_path / "exploded")
assert result.markdown == original
def test_migration_reference_backed_transclusion_example():
source = EXAMPLES / "legacy-transclusion-context.md"
document = parse_markdown_file(source)
context = ProcessorContext(
root=EXAMPLES,
current_path=source,
namespaces=load_namespaces(document.frontmatter),
)
result = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
assert result.valid
assert "Payment is due within 30 days" in result.results[0].content
def test_migration_path_include_example():
source = EXAMPLES / "legacy-path-include.md"
result = resolve_includes(
source.read_text(encoding="utf-8"),
base_dir=EXAMPLES,
current_path=source,
)
assert "## Warranty" in result.markdown
assert "Warranty begins on the effective date" in result.markdown
def test_migration_literate_example_tangles():
source = EXAMPLES / "legacy-literate.md"
result = tangle_markdown(source.read_text(encoding="utf-8"), source_path=source)
assert result.valid
assert result.files[0].path == "src/app.py"
assert "CONFIG" in result.files[0].content
assert "<<config>>" not in result.files[0].content

View File

@@ -3,7 +3,7 @@ id: MKTT-WP-0010
type: workplan
title: "Content References, Processors, and Literate Workflows"
domain: markitect
status: todo
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: P1
@@ -55,7 +55,7 @@ See `docs/content-reference-literate-workflow-research.md`.
```task
id: MKTT-WP-0010-T001
status: todo
status: done
priority: high
state_hub_task_id: "f70d2b9d-151b-46c6-9613-bd6bdbf164e7"
```
@@ -66,11 +66,18 @@ resolver inputs/outputs, and error cases.
Output: reference model docs, examples, and tests for path, namespace, selector,
and ID resolution.
Initial implementation completed with a `reference` extension package,
frontmatter namespace loading, root-bounded path resolution, existing query
selector reuse, heading/section/block fragment IDs, CLI access via
`mkt ref resolve`, reference docs, examples, and tests. Region/tag/fenced-block
addressing continues in P10.3; processor dependency/provenance use continues in
P10.2 and P10.5.
## P10.2 - Add token-safe transforms and operation provenance
```task
id: MKTT-WP-0010-T002
status: todo
status: done
priority: high
state_hub_task_id: "e35639b7-756f-4993-8b3c-2e58b23e0eca"
```
@@ -80,11 +87,17 @@ structured operation provenance, dependency edges, source spans, and diagnostics
Output: token-safe transform implementation and provenance result envelope.
Initial implementation completed with token-safe heading shifts, include
markers that stay literal inside fenced or indented code blocks, additive
`OperationProvenance` events on transform/include results, dependency edges for
resolved includes, docs, and regression tests. Rich structured diagnostics and
source maps continue through P10.3, P10.4, and P10.5.
## P10.3 - Implement named regions and addressable block selectors
```task
id: MKTT-WP-0010-T003
status: todo
status: done
priority: high
state_hub_task_id: "98cafe28-a364-48f1-ae55-cb47c71d9441"
```
@@ -94,11 +107,17 @@ selection by ID/tag/line range where appropriate.
Output: region parser/resolver, CLI examples, and source-snippet tests.
Initial implementation completed as reference-layer extensions: named
`mkt:region` comments, region tags, fenced-block IDs and tags from info-string
attributes, `#line:start-end` ranges, convenience ID lookup ordering, docs,
examples, and tests. Deeper source maps and processor-owned block semantics
continue in P10.5 and P10.6.
## P10.4 - Reimplement reversible explode/implode variants
```task
id: MKTT-WP-0010-T004
status: todo
status: done
priority: high
state_hub_task_id: "67f77aa1-a7ee-485c-891e-6ae7ecc52067"
```
@@ -111,11 +130,16 @@ reference and processor model is stable.
Output: `mkt explode`, `mkt implode`, manifest schema, roundtrip tests.
Initial implementation completed with a separate `explode` extension package,
manifest-first flat and hierarchical variants, exact roundtrip implode,
non-empty output protection, CLI commands, docs, and tests. Semantic variants
remain deferred until processor and content-class semantics are stable.
## P10.5 - Define processor registry for fenced blocks
```task
id: MKTT-WP-0010-T005
status: todo
status: done
priority: high
state_hub_task_id: "eb7cde08-8a73-4163-ac54-19a2bc7b5f88"
```
@@ -126,11 +150,18 @@ and return generated content/files, diagnostics, dependencies, and provenance.
Output: processor registry API, deterministic built-in processors, and tests.
Initial implementation completed with a deterministic `processor` extension
package, fenced-block discovery, explicit registry, context/policy envelope,
result files/diagnostics/dependencies/provenance, built-in identity,
uppercase, and reference-backed include processors, CLI `mkt process`, docs,
examples, and tests. Arbitrary code or LLM execution remains intentionally
outside this deterministic registry floor.
## P10.6 - Implement literate weave/tangle MVP
```task
id: MKTT-WP-0010-T006
status: todo
status: done
priority: high
state_hub_task_id: "090fcc38-758b-4414-b941-40f217eb17ca"
```
@@ -141,11 +172,16 @@ cross-references.
Output: `mkt tangle`, `mkt weave`, chunk-reference diagnostics, examples.
Initial implementation completed with a `literate` extension package, named
fenced code chunks, `tangle` targets, noweb-style `<<chunk-id>>` expansion,
missing/cyclic chunk diagnostics, deterministic file writing, woven chunk
index output, CLI `mkt tangle`/`mkt weave`, docs, examples, and tests.
## P10.7 - Design content class composition and multi-inheritance
```task
id: MKTT-WP-0010-T007
status: todo
status: done
priority: medium
state_hub_task_id: "220e6b27-2d7b-4c22-b5e8-304198ecfea8"
```
@@ -156,11 +192,16 @@ diagnostics.
Output: architecture note, examples, and a small deterministic resolver spike.
Initial implementation completed with a `content_class` extension package,
C3-style deterministic linearization, explicit slot merge policies, conflict
diagnostics, CLI `mkt class resolve`, docs, examples, and tests. Markdown
instantiation and snippet injection remain deferred to later integration work.
## P10.8 - Add migration examples from markitect-main
```task
id: MKTT-WP-0010-T008
status: todo
status: done
priority: high
state_hub_task_id: "287637d3-1997-43b2-b97d-10587d565cec"
```
@@ -169,3 +210,9 @@ Translate the relevant old explode/implode, transclusion, and spaces reference
graph tests into successor-style fixtures and examples.
Output: migration test inventory, example documents, and parity notes.
Initial implementation completed with WP-0010 migration parity notes,
successor-style examples for explode/implode, path include, reference-backed
transclusion, and literate tangling, plus tests that exercise these examples.
Legacy platform, database, infospace, rendering, and provider-specific
behaviors remain intentionally out of scope.