extension for ref resolve, explode, implode, weave, tangle

This commit is contained in:
2026-05-04 02:25:49 +02:00
parent 8203f50fd5
commit 65bfc1aebf
39 changed files with 3959 additions and 25 deletions

79
docs/content-classes.md Normal file
View File

@@ -0,0 +1,79 @@
# Content Classes
Date: 2026-05-04
## Purpose
Content classes are data-defined composition rules for reusable document
structures, overlays, and variants. They are not Python inheritance. They are a
deterministic way to combine slots such as sections, assertions, snippets,
processors, and style guidance.
This is the P10.7 resolver spike for future class/object-style workflows.
## Model
A class can declare:
- `extends`: parent classes
- `slots`: structured values to contribute
- `merge_policies`: per-slot merge behavior
Example:
```yaml
classes:
base-prd:
slots:
sections:
- Problem
- Decision
enterprise:
extends:
- base-prd
slots:
sections:
- Compliance
merge_policies:
sections: append
```
## Linearization
Multiple inheritance uses a C3-style linearization. That gives us:
- deterministic parent ordering
- monotonic inheritance behavior
- explicit diagnostics for cycles, unknown parents, and inconsistent precedence
The resolved class is merged from base to leaf according to the computed
linearization.
## Merge Policies
Initial policies:
- `replace`
- `append`
- `prepend`
- `deep_merge`
- `error_on_conflict`
Unknown policies and invalid value shapes produce diagnostics.
## CLI
Resolve a class:
```bash
mkt class resolve examples/classes/prd-classes.yaml enterprise-prd
```
JSON/YAML output includes the linearization, merged slots, and diagnostics.
## Extension Boundary
The current resolver does not yet instantiate Markdown documents or inject
snippets. It establishes the deterministic inheritance and merge floor. Later
work can connect resolved slots to contracts, references, processors, and
generation plans.

139
docs/content-references.md Normal file
View File

@@ -0,0 +1,139 @@
# Content References
Date: 2026-05-04
## Purpose
Content references are the first WP-0010 extension layer. They give Markitect a
shared way to name and resolve Markdown content units without changing the
existing parser, query, transform, compose, include, contract, or cache APIs.
The goal is a small resolver that later features can reuse:
- includes can accept references as well as paths
- explode/implode can write manifests with stable unit IDs
- processors can receive typed units and dependency edges
- tangle/weave can address chunks and generated outputs
- cache and access-control backends can index the same IDs
## Reference Syntax
References are compact strings:
```text
path/to/file.md
path/to/file.md#section:introduction
path/to/file.md::sections[heading=Decision]
std:clauses/payment.md
std:clauses/payment.md#payment-terms
std:clauses/payment.md#region:boilerplate
std:clauses/payment.md#tag:legal
#local-section
```
The parts are:
- `namespace:`: optional namespace declared in frontmatter
- `path`: a Markdown file path relative to the current document, or relative to
the namespace target
- `#fragment`: optional unit lookup inside the target document
- `::selector`: optional existing Markitect query selector
Fragments and selectors are mutually exclusive during resolution. Selectors are
delegated to the existing query engine, which keeps this layer small and avoids
inventing a second query language.
## Namespaces
Namespaces live in Markdown frontmatter:
```yaml
---
namespaces:
std: ./standard
product: ../product-docs
---
```
Namespace keys may be written with or without a trailing colon. Namespace values
are string paths. Relative namespace paths resolve under the resolver root. All
resolved file paths must stay inside that root.
## Content Units
The resolver currently emits these unit kinds:
- `document`: full Markdown file
- `section`: heading-led Markdown section
- `heading`: heading line
- existing query kinds such as `frontmatter`, `block`, `metrics`, or `section`
Each unit includes:
- `unit_id`: stable local ID
- `kind`
- `source_path`
- source line span when available
- `name`
- `content_hash`
- raw text
- metadata from the source or query match
Heading and section IDs use an explicit trailing heading ID when present:
```markdown
## Payment Terms {#payment-terms}
```
Otherwise the resolver derives a slug from the heading text and adds numeric
suffixes for collisions.
Named regions use HTML comments so they can live in Markdown and many source
files without changing the rendered document:
```markdown
<!-- mkt:region id="boilerplate" tags="legal reuse" -->
Reusable text.
<!-- /mkt:region -->
```
Fenced blocks can be addressed when their info string includes an ID:
````markdown
```python {#load-config tags="code setup" tangle="src/config.py"}
def load_config():
return {}
```
````
Supported fragments now include:
- `#section:<id-or-heading-slug>`
- `#heading:<id-or-heading-slug>`
- `#region:<id>`
- `#fence:<id>`
- `#tag:<tag>`
- `#line:<start>` or `#line:<start>-<end>`
- `#<id>` as a convenience lookup across sections, regions, fenced blocks, and
headings
## CLI
Resolve a reference from a context document:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md#payment-terms'
```
JSON and YAML formats include the resolved text and metadata:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md::sections[heading=Warranty]' --format json
```
## Extension Boundary
This layer is intentionally read-only. It does not replace `mkt include`,
`mkt query`, or `mkt extract`. Instead it defines the address model those tools
can adopt when their next WP-0010 tasks require richer content identity,
processor dependencies, source maps, and reversible manifests.

69
docs/explode-implode.md Normal file
View File

@@ -0,0 +1,69 @@
# Explode and Implode
Date: 2026-05-04
## Purpose
`mkt explode` and `mkt implode` reintroduce the useful old Markitect
large-document workflow as a slim WP-0010 extension. The design is
manifest-first: the exploded directory is editable, but the manifest preserves
ordering, source spans, heading metadata, hashes, frontmatter, and the selected
layout variant.
This keeps the operation reversible without requiring a database or service.
## Variants
The initial variants are:
- `flat`: writes ordered section files under `sections/`.
- `hierarchical`: writes child section files below parent heading directories.
Both variants preserve the same manifest model. A later semantic variant can
reuse the reference and processor framework once those layers are stable.
## CLI
Explode a document:
```bash
mkt explode docs/source.md --output-dir work/source-exploded
```
Use a hierarchical directory shape:
```bash
mkt explode docs/source.md --output-dir work/source-tree --variant hierarchical
```
Implode the directory back into one Markdown file:
```bash
mkt implode work/source-exploded --output docs/source-rebuilt.md
```
By default `mkt explode` refuses to write into a non-empty output directory. Use
`--force` when an explicit overwrite is intended.
## Manifest
The manifest is written as `markitect-explode.yaml` in the output directory.
It records:
- manifest version
- original source path and SHA-256 hash
- variant
- raw frontmatter block
- ordered entries with file path, kind, unit ID, source line span, heading
metadata, and content hash
Implode reads the manifest entries in order and concatenates the current entry
files. If users edit section files, the rebuilt document reflects those edits
while preserving the original frontmatter and ordering.
## Extension Boundary
This implementation is intentionally not semantic yet. It does not infer
contracts, classes, named chunks, or processor outputs. Instead it establishes a
small reversible substrate that later WP-0010 tasks can enrich with regions,
references, processors, source maps, and weave/tangle behavior.

View File

@@ -0,0 +1,79 @@
# Literate Weave and Tangle
Date: 2026-05-04
## Purpose
The literate workflow layer brings a small Knuth-style weave/tangle capability
to Markdown without requiring a separate language. Prose stays in Markdown.
Named code chunks live in fenced blocks. Tangling emits source files.
Weaving keeps the document readable and adds a deterministic chunk index.
## Chunk Syntax
Named chunks use fenced block attributes:
````markdown
```python {#helpers}
def helper():
return "ready"
```
````
A chunk becomes an output root when it declares `tangle`:
````markdown
```python {#main tangle="src/app.py"}
<<helpers>>
def main():
return helper()
```
````
Chunk references use noweb-style syntax:
```text
<<helpers>>
```
Whole-line chunk references preserve indentation when expanded.
## CLI
Tangle files:
```bash
mkt tangle examples/literate/app.md --output-dir build/literate
```
Inspect without writing:
```bash
mkt tangle examples/literate/app.md --format json
```
Weave documentation:
```bash
mkt weave examples/literate/app.md --output build/app-woven.md
```
## Diagnostics
Tangling reports structured diagnostics for missing chunks and cyclic chunk
references. Tangled files are only written by the CLI when the result is valid.
## Extension Boundary
The MVP deliberately keeps the model narrow:
- named fenced blocks
- `tangle="<path>"`
- deterministic document-order concatenation for repeated targets
- noweb-style chunk expansion
- generated chunk index during weave
Future extensions can add richer source maps, processor execution,
language-specific extraction, and class/namespace-aware chunk selection without
changing this initial chunk model.

View File

@@ -0,0 +1,46 @@
# markitect-main WP-0010 Migration Notes
Date: 2026-05-04
## Purpose
This note captures the relevant `markitect-main` ideas that WP-0010 now
preserves in successor form.
The migration is conceptual rather than source-compatible. The successor keeps
Markdown-native behavior and removes old platform, database, infospace, and
service assumptions.
## Parity Map
| Legacy area | Successor shape | Status |
| --- | --- | --- |
| Explode/implode variants | `mkt explode`, `mkt implode`, manifest-first flat/hierarchical variants | Reimplemented |
| Transclusion/includes | `mkt include` for path markers; processor `mkt-include` for reference-backed content | Reimplemented with clearer boundaries |
| Spaces/infospace references | Frontmatter namespaces plus `mkt ref resolve` | Reframed as syntax-layer references |
| Fenced-block processors | Explicit deterministic processor registry | Reimplemented as opt-in extension |
| Literate workflows | `mkt tangle`, `mkt weave`, named fenced chunks, noweb references | Reimplemented as MVP |
| Content classes/overlays | Data-defined classes with C3-style linearization and merge policies | Resolver spike implemented |
## Intentionally Not Migrated
These old concerns stay out of the WP-0010 toolkit layer:
- database-backed infospace lifecycle
- GraphQL/service APIs
- provider-specific LLM execution
- rendering/plugin/browser/editor infrastructure
- project finance, wishlist, and profile tooling
## Migration Examples
Examples live under `examples/migration/`:
- `legacy-explode-source.md`: large document roundtrip via explode/implode.
- `legacy-transclusion-context.md`: namespace-backed reference include.
- `legacy-path-include.md`: simple path-based include marker.
- `legacy-literate.md`: named chunks tangled into source.
The tests in `tests/test_wp0010_migration_examples.py` exercise these files as
successor fixtures. They are deliberately small, but they lock down the
behaviors we most wanted to keep from `markitect-main`.

81
docs/processors.md Normal file
View File

@@ -0,0 +1,81 @@
# Fenced-Block Processors
Date: 2026-05-04
## Purpose
The processor registry is the deterministic execution boundary for WP-0010.
It lets Markdown fenced blocks opt into named processors while keeping
execution explicit, inspectable, and non-magical.
Processors receive:
- the fenced content unit
- resolver-capable context
- variables and policy maps
Processors return:
- generated content
- optional generated files
- diagnostics
- dependencies
- operation provenance
No built-in processor runs arbitrary code.
## Syntax
A fenced block opts into processing by using an `mkt-<processor>` language:
````markdown
```mkt-uppercase {#shout}
hello
```
````
The processor can also be named with attributes:
````markdown
```markdown {#example processor="identity"}
Rendered as-is by the identity processor.
```
````
## Built-In Processors
Initial deterministic processors:
- `identity`: returns the fenced block content unchanged.
- `uppercase`: returns uppercased content; mainly a registry smoke-test.
- `include`: resolves a `ref` attribute through the content reference resolver.
Reference-backed include:
````markdown
```mkt-include {#payment ref="std:clauses.md#payment-terms"}
```
````
The include processor returns the resolved content, records the target file as
a dependency, and emits operation provenance.
## CLI
Run processors in a document:
```bash
mkt process examples/references/context.md --format json
```
Text output reports processor validity, block IDs, and the first generated
content line. JSON/YAML output includes diagnostics, dependencies, and
provenance.
## Extension Boundary
The registry is deliberately small. It does not render a final document yet and
does not execute shell, Python, SQL, or LLM calls. Those can become opt-in
processors later, but they should use the same result envelope so diagnostics,
dependencies, provenance, cache invalidation, and access-control hooks stay
consistent.

View File

@@ -27,6 +27,10 @@ Supported operations:
The API equivalent is `transform_markdown(...)`.
Heading shifts are token-safe: Markdown fenced and indented code blocks are
left untouched even if their lines look like headings. `TransformResult`
includes structured provenance events alongside the older operation-name list.
## Compose
Use `mkt compose` to concatenate Markdown inputs with predictable separators:
@@ -79,5 +83,12 @@ Resolution rules:
directory.
- Recursive includes are resolved up to `--max-depth`.
- Cycles and missing files fail with explicit errors.
- Include markers inside fenced or indented code blocks are left literal.
The API equivalent is `resolve_includes(...)`.
`IncludeResult` includes structured provenance events. Each include event
records the source marker line when available, the resolved target path,
dependency edge, selector, heading shift, and frontmatter policy. This is the
first provenance envelope used by later WP-0010 processor, source-map, and
explode/implode work.

View File

@@ -32,7 +32,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |