generated from coulomb/repo-seed
extension for ref resolve, explode, implode, weave, tangle
This commit is contained in:
79
docs/content-classes.md
Normal file
79
docs/content-classes.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Content Classes
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
Content classes are data-defined composition rules for reusable document
|
||||
structures, overlays, and variants. They are not Python inheritance. They are a
|
||||
deterministic way to combine slots such as sections, assertions, snippets,
|
||||
processors, and style guidance.
|
||||
|
||||
This is the P10.7 resolver spike for future class/object-style workflows.
|
||||
|
||||
## Model
|
||||
|
||||
A class can declare:
|
||||
|
||||
- `extends`: parent classes
|
||||
- `slots`: structured values to contribute
|
||||
- `merge_policies`: per-slot merge behavior
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
classes:
|
||||
base-prd:
|
||||
slots:
|
||||
sections:
|
||||
- Problem
|
||||
- Decision
|
||||
enterprise:
|
||||
extends:
|
||||
- base-prd
|
||||
slots:
|
||||
sections:
|
||||
- Compliance
|
||||
merge_policies:
|
||||
sections: append
|
||||
```
|
||||
|
||||
## Linearization
|
||||
|
||||
Multiple inheritance uses a C3-style linearization. That gives us:
|
||||
|
||||
- deterministic parent ordering
|
||||
- monotonic inheritance behavior
|
||||
- explicit diagnostics for cycles, unknown parents, and inconsistent precedence
|
||||
|
||||
The resolved class is merged from base to leaf according to the computed
|
||||
linearization.
|
||||
|
||||
## Merge Policies
|
||||
|
||||
Initial policies:
|
||||
|
||||
- `replace`
|
||||
- `append`
|
||||
- `prepend`
|
||||
- `deep_merge`
|
||||
- `error_on_conflict`
|
||||
|
||||
Unknown policies and invalid value shapes produce diagnostics.
|
||||
|
||||
## CLI
|
||||
|
||||
Resolve a class:
|
||||
|
||||
```bash
|
||||
mkt class resolve examples/classes/prd-classes.yaml enterprise-prd
|
||||
```
|
||||
|
||||
JSON/YAML output includes the linearization, merged slots, and diagnostics.
|
||||
|
||||
## Extension Boundary
|
||||
|
||||
The current resolver does not yet instantiate Markdown documents or inject
|
||||
snippets. It establishes the deterministic inheritance and merge floor. Later
|
||||
work can connect resolved slots to contracts, references, processors, and
|
||||
generation plans.
|
||||
139
docs/content-references.md
Normal file
139
docs/content-references.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Content References
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
Content references are the first WP-0010 extension layer. They give Markitect a
|
||||
shared way to name and resolve Markdown content units without changing the
|
||||
existing parser, query, transform, compose, include, contract, or cache APIs.
|
||||
|
||||
The goal is a small resolver that later features can reuse:
|
||||
|
||||
- includes can accept references as well as paths
|
||||
- explode/implode can write manifests with stable unit IDs
|
||||
- processors can receive typed units and dependency edges
|
||||
- tangle/weave can address chunks and generated outputs
|
||||
- cache and access-control backends can index the same IDs
|
||||
|
||||
## Reference Syntax
|
||||
|
||||
References are compact strings:
|
||||
|
||||
```text
|
||||
path/to/file.md
|
||||
path/to/file.md#section:introduction
|
||||
path/to/file.md::sections[heading=Decision]
|
||||
std:clauses/payment.md
|
||||
std:clauses/payment.md#payment-terms
|
||||
std:clauses/payment.md#region:boilerplate
|
||||
std:clauses/payment.md#tag:legal
|
||||
#local-section
|
||||
```
|
||||
|
||||
The parts are:
|
||||
|
||||
- `namespace:`: optional namespace declared in frontmatter
|
||||
- `path`: a Markdown file path relative to the current document, or relative to
|
||||
the namespace target
|
||||
- `#fragment`: optional unit lookup inside the target document
|
||||
- `::selector`: optional existing Markitect query selector
|
||||
|
||||
Fragments and selectors are mutually exclusive during resolution. Selectors are
|
||||
delegated to the existing query engine, which keeps this layer small and avoids
|
||||
inventing a second query language.
|
||||
|
||||
## Namespaces
|
||||
|
||||
Namespaces live in Markdown frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
namespaces:
|
||||
std: ./standard
|
||||
product: ../product-docs
|
||||
---
|
||||
```
|
||||
|
||||
Namespace keys may be written with or without a trailing colon. Namespace values
|
||||
are string paths. Relative namespace paths resolve under the resolver root. All
|
||||
resolved file paths must stay inside that root.
|
||||
|
||||
## Content Units
|
||||
|
||||
The resolver currently emits these unit kinds:
|
||||
|
||||
- `document`: full Markdown file
|
||||
- `section`: heading-led Markdown section
|
||||
- `heading`: heading line
|
||||
- existing query kinds such as `frontmatter`, `block`, `metrics`, or `section`
|
||||
|
||||
Each unit includes:
|
||||
|
||||
- `unit_id`: stable local ID
|
||||
- `kind`
|
||||
- `source_path`
|
||||
- source line span when available
|
||||
- `name`
|
||||
- `content_hash`
|
||||
- raw text
|
||||
- metadata from the source or query match
|
||||
|
||||
Heading and section IDs use an explicit trailing heading ID when present:
|
||||
|
||||
```markdown
|
||||
## Payment Terms {#payment-terms}
|
||||
```
|
||||
|
||||
Otherwise the resolver derives a slug from the heading text and adds numeric
|
||||
suffixes for collisions.
|
||||
|
||||
Named regions use HTML comments so they can live in Markdown and many source
|
||||
files without changing the rendered document:
|
||||
|
||||
```markdown
|
||||
<!-- mkt:region id="boilerplate" tags="legal reuse" -->
|
||||
Reusable text.
|
||||
<!-- /mkt:region -->
|
||||
```
|
||||
|
||||
Fenced blocks can be addressed when their info string includes an ID:
|
||||
|
||||
````markdown
|
||||
```python {#load-config tags="code setup" tangle="src/config.py"}
|
||||
def load_config():
|
||||
return {}
|
||||
```
|
||||
````
|
||||
|
||||
Supported fragments now include:
|
||||
|
||||
- `#section:<id-or-heading-slug>`
|
||||
- `#heading:<id-or-heading-slug>`
|
||||
- `#region:<id>`
|
||||
- `#fence:<id>`
|
||||
- `#tag:<tag>`
|
||||
- `#line:<start>` or `#line:<start>-<end>`
|
||||
- `#<id>` as a convenience lookup across sections, regions, fenced blocks, and
|
||||
headings
|
||||
|
||||
## CLI
|
||||
|
||||
Resolve a reference from a context document:
|
||||
|
||||
```bash
|
||||
mkt ref resolve examples/references/context.md 'std:clauses.md#payment-terms'
|
||||
```
|
||||
|
||||
JSON and YAML formats include the resolved text and metadata:
|
||||
|
||||
```bash
|
||||
mkt ref resolve examples/references/context.md 'std:clauses.md::sections[heading=Warranty]' --format json
|
||||
```
|
||||
|
||||
## Extension Boundary
|
||||
|
||||
This layer is intentionally read-only. It does not replace `mkt include`,
|
||||
`mkt query`, or `mkt extract`. Instead it defines the address model those tools
|
||||
can adopt when their next WP-0010 tasks require richer content identity,
|
||||
processor dependencies, source maps, and reversible manifests.
|
||||
69
docs/explode-implode.md
Normal file
69
docs/explode-implode.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Explode and Implode
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
`mkt explode` and `mkt implode` reintroduce the useful old Markitect
|
||||
large-document workflow as a slim WP-0010 extension. The design is
|
||||
manifest-first: the exploded directory is editable, but the manifest preserves
|
||||
ordering, source spans, heading metadata, hashes, frontmatter, and the selected
|
||||
layout variant.
|
||||
|
||||
This keeps the operation reversible without requiring a database or service.
|
||||
|
||||
## Variants
|
||||
|
||||
The initial variants are:
|
||||
|
||||
- `flat`: writes ordered section files under `sections/`.
|
||||
- `hierarchical`: writes child section files below parent heading directories.
|
||||
|
||||
Both variants preserve the same manifest model. A later semantic variant can
|
||||
reuse the reference and processor framework once those layers are stable.
|
||||
|
||||
## CLI
|
||||
|
||||
Explode a document:
|
||||
|
||||
```bash
|
||||
mkt explode docs/source.md --output-dir work/source-exploded
|
||||
```
|
||||
|
||||
Use a hierarchical directory shape:
|
||||
|
||||
```bash
|
||||
mkt explode docs/source.md --output-dir work/source-tree --variant hierarchical
|
||||
```
|
||||
|
||||
Implode the directory back into one Markdown file:
|
||||
|
||||
```bash
|
||||
mkt implode work/source-exploded --output docs/source-rebuilt.md
|
||||
```
|
||||
|
||||
By default `mkt explode` refuses to write into a non-empty output directory. Use
|
||||
`--force` when an explicit overwrite is intended.
|
||||
|
||||
## Manifest
|
||||
|
||||
The manifest is written as `markitect-explode.yaml` in the output directory.
|
||||
It records:
|
||||
|
||||
- manifest version
|
||||
- original source path and SHA-256 hash
|
||||
- variant
|
||||
- raw frontmatter block
|
||||
- ordered entries with file path, kind, unit ID, source line span, heading
|
||||
metadata, and content hash
|
||||
|
||||
Implode reads the manifest entries in order and concatenates the current entry
|
||||
files. If users edit section files, the rebuilt document reflects those edits
|
||||
while preserving the original frontmatter and ordering.
|
||||
|
||||
## Extension Boundary
|
||||
|
||||
This implementation is intentionally not semantic yet. It does not infer
|
||||
contracts, classes, named chunks, or processor outputs. Instead it establishes a
|
||||
small reversible substrate that later WP-0010 tasks can enrich with regions,
|
||||
references, processors, source maps, and weave/tangle behavior.
|
||||
79
docs/literate-weave-tangle.md
Normal file
79
docs/literate-weave-tangle.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Literate Weave and Tangle
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
The literate workflow layer brings a small Knuth-style weave/tangle capability
|
||||
to Markdown without requiring a separate language. Prose stays in Markdown.
|
||||
Named code chunks live in fenced blocks. Tangling emits source files.
|
||||
Weaving keeps the document readable and adds a deterministic chunk index.
|
||||
|
||||
## Chunk Syntax
|
||||
|
||||
Named chunks use fenced block attributes:
|
||||
|
||||
````markdown
|
||||
```python {#helpers}
|
||||
def helper():
|
||||
return "ready"
|
||||
```
|
||||
````
|
||||
|
||||
A chunk becomes an output root when it declares `tangle`:
|
||||
|
||||
````markdown
|
||||
```python {#main tangle="src/app.py"}
|
||||
<<helpers>>
|
||||
|
||||
def main():
|
||||
return helper()
|
||||
```
|
||||
````
|
||||
|
||||
Chunk references use noweb-style syntax:
|
||||
|
||||
```text
|
||||
<<helpers>>
|
||||
```
|
||||
|
||||
Whole-line chunk references preserve indentation when expanded.
|
||||
|
||||
## CLI
|
||||
|
||||
Tangle files:
|
||||
|
||||
```bash
|
||||
mkt tangle examples/literate/app.md --output-dir build/literate
|
||||
```
|
||||
|
||||
Inspect without writing:
|
||||
|
||||
```bash
|
||||
mkt tangle examples/literate/app.md --format json
|
||||
```
|
||||
|
||||
Weave documentation:
|
||||
|
||||
```bash
|
||||
mkt weave examples/literate/app.md --output build/app-woven.md
|
||||
```
|
||||
|
||||
## Diagnostics
|
||||
|
||||
Tangling reports structured diagnostics for missing chunks and cyclic chunk
|
||||
references. Tangled files are only written by the CLI when the result is valid.
|
||||
|
||||
## Extension Boundary
|
||||
|
||||
The MVP deliberately keeps the model narrow:
|
||||
|
||||
- named fenced blocks
|
||||
- `tangle="<path>"`
|
||||
- deterministic document-order concatenation for repeated targets
|
||||
- noweb-style chunk expansion
|
||||
- generated chunk index during weave
|
||||
|
||||
Future extensions can add richer source maps, processor execution,
|
||||
language-specific extraction, and class/namespace-aware chunk selection without
|
||||
changing this initial chunk model.
|
||||
46
docs/markitect-main-wp0010-migration-notes.md
Normal file
46
docs/markitect-main-wp0010-migration-notes.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# markitect-main WP-0010 Migration Notes
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
This note captures the relevant `markitect-main` ideas that WP-0010 now
|
||||
preserves in successor form.
|
||||
|
||||
The migration is conceptual rather than source-compatible. The successor keeps
|
||||
Markdown-native behavior and removes old platform, database, infospace, and
|
||||
service assumptions.
|
||||
|
||||
## Parity Map
|
||||
|
||||
| Legacy area | Successor shape | Status |
|
||||
| --- | --- | --- |
|
||||
| Explode/implode variants | `mkt explode`, `mkt implode`, manifest-first flat/hierarchical variants | Reimplemented |
|
||||
| Transclusion/includes | `mkt include` for path markers; processor `mkt-include` for reference-backed content | Reimplemented with clearer boundaries |
|
||||
| Spaces/infospace references | Frontmatter namespaces plus `mkt ref resolve` | Reframed as syntax-layer references |
|
||||
| Fenced-block processors | Explicit deterministic processor registry | Reimplemented as opt-in extension |
|
||||
| Literate workflows | `mkt tangle`, `mkt weave`, named fenced chunks, noweb references | Reimplemented as MVP |
|
||||
| Content classes/overlays | Data-defined classes with C3-style linearization and merge policies | Resolver spike implemented |
|
||||
|
||||
## Intentionally Not Migrated
|
||||
|
||||
These old concerns stay out of the WP-0010 toolkit layer:
|
||||
|
||||
- database-backed infospace lifecycle
|
||||
- GraphQL/service APIs
|
||||
- provider-specific LLM execution
|
||||
- rendering/plugin/browser/editor infrastructure
|
||||
- project finance, wishlist, and profile tooling
|
||||
|
||||
## Migration Examples
|
||||
|
||||
Examples live under `examples/migration/`:
|
||||
|
||||
- `legacy-explode-source.md`: large document roundtrip via explode/implode.
|
||||
- `legacy-transclusion-context.md`: namespace-backed reference include.
|
||||
- `legacy-path-include.md`: simple path-based include marker.
|
||||
- `legacy-literate.md`: named chunks tangled into source.
|
||||
|
||||
The tests in `tests/test_wp0010_migration_examples.py` exercise these files as
|
||||
successor fixtures. They are deliberately small, but they lock down the
|
||||
behaviors we most wanted to keep from `markitect-main`.
|
||||
81
docs/processors.md
Normal file
81
docs/processors.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Fenced-Block Processors
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
The processor registry is the deterministic execution boundary for WP-0010.
|
||||
It lets Markdown fenced blocks opt into named processors while keeping
|
||||
execution explicit, inspectable, and non-magical.
|
||||
|
||||
Processors receive:
|
||||
|
||||
- the fenced content unit
|
||||
- resolver-capable context
|
||||
- variables and policy maps
|
||||
|
||||
Processors return:
|
||||
|
||||
- generated content
|
||||
- optional generated files
|
||||
- diagnostics
|
||||
- dependencies
|
||||
- operation provenance
|
||||
|
||||
No built-in processor runs arbitrary code.
|
||||
|
||||
## Syntax
|
||||
|
||||
A fenced block opts into processing by using an `mkt-<processor>` language:
|
||||
|
||||
````markdown
|
||||
```mkt-uppercase {#shout}
|
||||
hello
|
||||
```
|
||||
````
|
||||
|
||||
The processor can also be named with attributes:
|
||||
|
||||
````markdown
|
||||
```markdown {#example processor="identity"}
|
||||
Rendered as-is by the identity processor.
|
||||
```
|
||||
````
|
||||
|
||||
## Built-In Processors
|
||||
|
||||
Initial deterministic processors:
|
||||
|
||||
- `identity`: returns the fenced block content unchanged.
|
||||
- `uppercase`: returns uppercased content; mainly a registry smoke-test.
|
||||
- `include`: resolves a `ref` attribute through the content reference resolver.
|
||||
|
||||
Reference-backed include:
|
||||
|
||||
````markdown
|
||||
```mkt-include {#payment ref="std:clauses.md#payment-terms"}
|
||||
```
|
||||
````
|
||||
|
||||
The include processor returns the resolved content, records the target file as
|
||||
a dependency, and emits operation provenance.
|
||||
|
||||
## CLI
|
||||
|
||||
Run processors in a document:
|
||||
|
||||
```bash
|
||||
mkt process examples/references/context.md --format json
|
||||
```
|
||||
|
||||
Text output reports processor validity, block IDs, and the first generated
|
||||
content line. JSON/YAML output includes diagnostics, dependencies, and
|
||||
provenance.
|
||||
|
||||
## Extension Boundary
|
||||
|
||||
The registry is deliberately small. It does not render a final document yet and
|
||||
does not execute shell, Python, SQL, or LLM calls. Those can become opt-in
|
||||
processors later, but they should use the same result envelope so diagnostics,
|
||||
dependencies, provenance, cache invalidation, and access-control hooks stay
|
||||
consistent.
|
||||
@@ -27,6 +27,10 @@ Supported operations:
|
||||
|
||||
The API equivalent is `transform_markdown(...)`.
|
||||
|
||||
Heading shifts are token-safe: Markdown fenced and indented code blocks are
|
||||
left untouched even if their lines look like headings. `TransformResult`
|
||||
includes structured provenance events alongside the older operation-name list.
|
||||
|
||||
## Compose
|
||||
|
||||
Use `mkt compose` to concatenate Markdown inputs with predictable separators:
|
||||
@@ -79,5 +83,12 @@ Resolution rules:
|
||||
directory.
|
||||
- Recursive includes are resolved up to `--max-depth`.
|
||||
- Cycles and missing files fail with explicit errors.
|
||||
- Include markers inside fenced or indented code blocks are left literal.
|
||||
|
||||
The API equivalent is `resolve_includes(...)`.
|
||||
|
||||
`IncludeResult` includes structured provenance events. Each include event
|
||||
records the source marker line when available, the resolved target path,
|
||||
dependency edge, selector, heading shift, and frontmatter policy. This is the
|
||||
first provenance envelope used by later WP-0010 processor, source-map, and
|
||||
explode/implode work.
|
||||
|
||||
@@ -32,7 +32,7 @@ and descriptions mirror the operational view.
|
||||
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
|
||||
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
|
||||
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
|
||||
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
|
||||
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
|
||||
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
||||
|
||||
30
examples/classes/prd-classes.yaml
Normal file
30
examples/classes/prd-classes.yaml
Normal file
@@ -0,0 +1,30 @@
|
||||
classes:
|
||||
base-prd:
|
||||
slots:
|
||||
sections:
|
||||
- Problem
|
||||
- Decision
|
||||
assertions:
|
||||
tone: plain
|
||||
audience: product
|
||||
|
||||
enterprise:
|
||||
extends:
|
||||
- base-prd
|
||||
slots:
|
||||
sections:
|
||||
- Compliance
|
||||
assertions:
|
||||
audience: enterprise buyers
|
||||
merge_policies:
|
||||
sections: append
|
||||
assertions: deep_merge
|
||||
|
||||
enterprise-prd:
|
||||
extends:
|
||||
- enterprise
|
||||
slots:
|
||||
sections:
|
||||
- Rollout
|
||||
merge_policies:
|
||||
sections: append
|
||||
15
examples/literate/app.md
Normal file
15
examples/literate/app.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Literate App Example
|
||||
|
||||
This example explains the helper before showing the application entry point.
|
||||
|
||||
```python {#helpers}
|
||||
def helper():
|
||||
return "ready"
|
||||
```
|
||||
|
||||
```python {#main tangle="src/app.py"}
|
||||
<<helpers>>
|
||||
|
||||
def main():
|
||||
return helper()
|
||||
```
|
||||
17
examples/migration/legacy-explode-source.md
Normal file
17
examples/migration/legacy-explode-source.md
Normal file
@@ -0,0 +1,17 @@
|
||||
---
|
||||
title: Legacy Explode Successor
|
||||
---
|
||||
|
||||
Opening material that used to be easy to lose in section-only exports.
|
||||
|
||||
# Overview
|
||||
|
||||
The successor explode flow preserves preamble, headings, order, and frontmatter.
|
||||
|
||||
## Detail
|
||||
|
||||
Nested sections remain addressable and roundtrip through the manifest.
|
||||
|
||||
# Follow-Up
|
||||
|
||||
Later sections keep their document order.
|
||||
12
examples/migration/legacy-literate.md
Normal file
12
examples/migration/legacy-literate.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Legacy Literate Successor
|
||||
|
||||
```python {#config}
|
||||
CONFIG = {"ready": True}
|
||||
```
|
||||
|
||||
```python {#main tangle="src/app.py"}
|
||||
<<config>>
|
||||
|
||||
def main():
|
||||
return CONFIG["ready"]
|
||||
```
|
||||
3
examples/migration/legacy-path-include.md
Normal file
3
examples/migration/legacy-path-include.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# Path Include
|
||||
|
||||
<!-- mkt:include path="standard/clauses.md" selector="sections[heading~=Warranty]" -->
|
||||
13
examples/migration/legacy-transclusion-context.md
Normal file
13
examples/migration/legacy-transclusion-context.md
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
title: Legacy Transclusion Successor
|
||||
namespaces:
|
||||
std: ./standard
|
||||
---
|
||||
|
||||
# Contract Draft
|
||||
|
||||
The old broad transclusion idea is now split into path includes and
|
||||
reference-backed processors.
|
||||
|
||||
```mkt-include {#payment-clause ref="std:clauses.md#payment"}
|
||||
```
|
||||
9
examples/migration/standard/clauses.md
Normal file
9
examples/migration/standard/clauses.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Standard Clauses
|
||||
|
||||
## Payment {#payment}
|
||||
|
||||
Payment is due within 30 days.
|
||||
|
||||
## Warranty {#warranty}
|
||||
|
||||
Warranty begins on the effective date.
|
||||
26
examples/references/context.md
Normal file
26
examples/references/context.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
title: Reference Context
|
||||
namespaces:
|
||||
std: ./standard
|
||||
---
|
||||
|
||||
# Reference Context
|
||||
|
||||
This document declares the namespaces used by reference examples.
|
||||
|
||||
## Local Overview
|
||||
|
||||
Local sections can be addressed with `#local-overview`.
|
||||
|
||||
<!-- mkt:region id="summary-snippet" tags="reuse summary" -->
|
||||
This named region can be resolved with `#region:summary-snippet` or
|
||||
`#tag:summary`.
|
||||
<!-- /mkt:region -->
|
||||
|
||||
```python {#example-loader tags="code demo" tangle="src/example_loader.py"}
|
||||
def load_example():
|
||||
return "ready"
|
||||
```
|
||||
|
||||
```mkt-include {#payment-example ref="std:clauses.md#payment-terms"}
|
||||
```
|
||||
9
examples/references/standard/clauses.md
Normal file
9
examples/references/standard/clauses.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Standard Clauses
|
||||
|
||||
## Payment Terms {#payment-terms}
|
||||
|
||||
Payment is due within 30 days unless a governing contract says otherwise.
|
||||
|
||||
## Warranty
|
||||
|
||||
The warranty period starts on the effective date.
|
||||
@@ -32,7 +32,26 @@ from markitect_tool.cache import (
|
||||
save_cache,
|
||||
scan_markdown_files,
|
||||
)
|
||||
from markitect_tool.content_class import (
|
||||
ClassCompositionResult,
|
||||
ContentClass,
|
||||
ContentClassRegistry,
|
||||
ContentClassResolutionError,
|
||||
load_content_class_file,
|
||||
load_content_classes,
|
||||
)
|
||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||
from markitect_tool.explode import (
|
||||
EXPLODE_MANIFEST_NAME,
|
||||
ExplodeEntry,
|
||||
ExplodeError,
|
||||
ExplodeManifest,
|
||||
ExplodeResult,
|
||||
ImplodeResult,
|
||||
explode_markdown_file,
|
||||
implode_markdown_directory,
|
||||
load_explode_manifest,
|
||||
)
|
||||
from markitect_tool.generation import (
|
||||
GeneratedDocument,
|
||||
GenerationHookRequest,
|
||||
@@ -44,21 +63,55 @@ from markitect_tool.generation import (
|
||||
load_generation_plan_file,
|
||||
run_generation_plan,
|
||||
)
|
||||
from markitect_tool.literate import (
|
||||
CodeChunk,
|
||||
LiterateFile,
|
||||
TangleResult,
|
||||
WeaveResult,
|
||||
discover_code_chunks,
|
||||
tangle_markdown,
|
||||
weave_markdown,
|
||||
write_tangle_files,
|
||||
)
|
||||
from markitect_tool.ops import (
|
||||
ComposeResult,
|
||||
IncludeError,
|
||||
IncludeResult,
|
||||
OperationProvenance,
|
||||
TransformResult,
|
||||
compose_files,
|
||||
resolve_includes,
|
||||
transform_markdown,
|
||||
)
|
||||
from markitect_tool.processor import (
|
||||
FencedProcessorBlock,
|
||||
ProcessorContext,
|
||||
ProcessorOutputFile,
|
||||
ProcessorRegistry,
|
||||
ProcessorRequest,
|
||||
ProcessorResult,
|
||||
ProcessorRun,
|
||||
default_processor_registry,
|
||||
discover_fenced_processors,
|
||||
run_fenced_processors,
|
||||
)
|
||||
from markitect_tool.query import (
|
||||
InvalidQueryError,
|
||||
QueryMatch,
|
||||
extract_document,
|
||||
query_document,
|
||||
)
|
||||
from markitect_tool.reference import (
|
||||
ContentUnit,
|
||||
ReferenceAddress,
|
||||
ReferenceContext,
|
||||
ReferenceResolution,
|
||||
ReferenceResolutionError,
|
||||
SourceSpan as ReferenceSourceSpan,
|
||||
load_namespaces,
|
||||
parse_reference,
|
||||
resolve_reference,
|
||||
)
|
||||
from markitect_tool.schema import (
|
||||
MarkdownSchema,
|
||||
SchemaValidationResult,
|
||||
@@ -109,8 +162,23 @@ __all__ = [
|
||||
"load_cache",
|
||||
"save_cache",
|
||||
"scan_markdown_files",
|
||||
"ClassCompositionResult",
|
||||
"ContentClass",
|
||||
"ContentClassRegistry",
|
||||
"ContentClassResolutionError",
|
||||
"load_content_class_file",
|
||||
"load_content_classes",
|
||||
"Diagnostic",
|
||||
"SourceLocation",
|
||||
"EXPLODE_MANIFEST_NAME",
|
||||
"ExplodeEntry",
|
||||
"ExplodeError",
|
||||
"ExplodeManifest",
|
||||
"ExplodeResult",
|
||||
"ImplodeResult",
|
||||
"explode_markdown_file",
|
||||
"implode_markdown_directory",
|
||||
"load_explode_manifest",
|
||||
"GeneratedDocument",
|
||||
"GenerationHookRequest",
|
||||
"GenerationHookResult",
|
||||
@@ -120,17 +188,45 @@ __all__ = [
|
||||
"generate_with_hook",
|
||||
"load_generation_plan_file",
|
||||
"run_generation_plan",
|
||||
"CodeChunk",
|
||||
"LiterateFile",
|
||||
"TangleResult",
|
||||
"WeaveResult",
|
||||
"discover_code_chunks",
|
||||
"tangle_markdown",
|
||||
"weave_markdown",
|
||||
"write_tangle_files",
|
||||
"ComposeResult",
|
||||
"IncludeError",
|
||||
"IncludeResult",
|
||||
"OperationProvenance",
|
||||
"TransformResult",
|
||||
"compose_files",
|
||||
"resolve_includes",
|
||||
"transform_markdown",
|
||||
"FencedProcessorBlock",
|
||||
"ProcessorContext",
|
||||
"ProcessorOutputFile",
|
||||
"ProcessorRegistry",
|
||||
"ProcessorRequest",
|
||||
"ProcessorResult",
|
||||
"ProcessorRun",
|
||||
"default_processor_registry",
|
||||
"discover_fenced_processors",
|
||||
"run_fenced_processors",
|
||||
"InvalidQueryError",
|
||||
"QueryMatch",
|
||||
"extract_document",
|
||||
"query_document",
|
||||
"ContentUnit",
|
||||
"ReferenceAddress",
|
||||
"ReferenceContext",
|
||||
"ReferenceResolution",
|
||||
"ReferenceResolutionError",
|
||||
"ReferenceSourceSpan",
|
||||
"load_namespaces",
|
||||
"parse_reference",
|
||||
"resolve_reference",
|
||||
"MissingTemplateVariable",
|
||||
"TemplateAnalysis",
|
||||
"TemplateError",
|
||||
|
||||
@@ -16,6 +16,10 @@ from markitect_tool.cache import (
|
||||
load_cache,
|
||||
save_cache,
|
||||
)
|
||||
from markitect_tool.content_class import (
|
||||
ContentClassResolutionError,
|
||||
load_content_class_file,
|
||||
)
|
||||
from markitect_tool.core import parse_markdown_file
|
||||
from markitect_tool.contract import (
|
||||
ContractLoaderError,
|
||||
@@ -24,6 +28,11 @@ from markitect_tool.contract import (
|
||||
load_contract_file,
|
||||
validate_contract,
|
||||
)
|
||||
from markitect_tool.explode import (
|
||||
ExplodeError,
|
||||
explode_markdown_file,
|
||||
implode_markdown_directory,
|
||||
)
|
||||
from markitect_tool.generation import (
|
||||
GenerationPlanError,
|
||||
generate_stub_from_contract,
|
||||
@@ -31,8 +40,16 @@ from markitect_tool.generation import (
|
||||
load_generation_plan_file,
|
||||
run_generation_plan,
|
||||
)
|
||||
from markitect_tool.literate import tangle_markdown, weave_markdown, write_tangle_files
|
||||
from markitect_tool.ops import IncludeError, compose_files, resolve_includes, transform_markdown
|
||||
from markitect_tool.processor import ProcessorContext, run_fenced_processors
|
||||
from markitect_tool.query import InvalidQueryError, extract_document, query_document
|
||||
from markitect_tool.reference import (
|
||||
ReferenceContext,
|
||||
ReferenceResolutionError,
|
||||
load_namespaces,
|
||||
resolve_reference,
|
||||
)
|
||||
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
||||
from markitect_tool.template import (
|
||||
MissingTemplateVariable,
|
||||
@@ -296,6 +313,224 @@ def include(
|
||||
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--output-dir",
|
||||
required=True,
|
||||
type=click.Path(file_okay=False, path_type=Path),
|
||||
help="Directory to write exploded Markdown files and manifest into.",
|
||||
)
|
||||
@click.option(
|
||||
"--variant",
|
||||
type=click.Choice(["flat", "hierarchical"], case_sensitive=False),
|
||||
default="flat",
|
||||
show_default=True,
|
||||
)
|
||||
@click.option("--force", is_flag=True, help="Allow writing into a non-empty output directory.")
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def explode(
|
||||
file: Path,
|
||||
output_dir: Path,
|
||||
variant: str,
|
||||
force: bool,
|
||||
output_format: str,
|
||||
) -> None:
|
||||
"""Explode a Markdown file into reversible section files."""
|
||||
|
||||
try:
|
||||
result = explode_markdown_file(file, output_dir, variant=variant, overwrite=force)
|
||||
except ExplodeError as exc:
|
||||
raise click.ClickException(str(exc)) from exc
|
||||
_emit_explode_result(result.to_dict(), output_format)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("directory", type=click.Path(exists=True, file_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--manifest",
|
||||
"manifest_path",
|
||||
type=click.Path(exists=True, dir_okay=False, path_type=Path),
|
||||
help="Manifest path. Defaults to markitect-explode.yaml in the input directory.",
|
||||
)
|
||||
@click.option(
|
||||
"--output",
|
||||
type=click.Path(dir_okay=False, path_type=Path),
|
||||
help="Write imploded Markdown to a file.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
|
||||
default="markdown",
|
||||
show_default=True,
|
||||
)
|
||||
def implode(
|
||||
directory: Path,
|
||||
manifest_path: Path | None,
|
||||
output: Path | None,
|
||||
output_format: str,
|
||||
) -> None:
|
||||
"""Implode a Markdown directory created by `mkt explode`."""
|
||||
|
||||
try:
|
||||
result = implode_markdown_directory(directory, manifest_path=manifest_path)
|
||||
except ExplodeError as exc:
|
||||
raise click.ClickException(str(exc)) from exc
|
||||
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||
|
||||
|
||||
@main.group("ref")
|
||||
def ref_group() -> None:
|
||||
"""Resolve namespaced Markdown content references."""
|
||||
|
||||
|
||||
@ref_group.command("resolve")
|
||||
@click.argument("context_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.argument("reference")
|
||||
@click.option(
|
||||
"--root",
|
||||
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||
default=Path("."),
|
||||
show_default=True,
|
||||
help="Root that relative paths and namespaces must stay within.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def ref_resolve(context_file: Path, reference: str, root: Path, output_format: str) -> None:
|
||||
"""Resolve a content reference using a Markdown document as context."""
|
||||
|
||||
context_document = parse_markdown_file(context_file)
|
||||
context = ReferenceContext.from_document(
|
||||
context_document,
|
||||
root=root,
|
||||
current_path=context_file,
|
||||
)
|
||||
try:
|
||||
resolution = resolve_reference(reference, context=context)
|
||||
except ReferenceResolutionError as exc:
|
||||
raise click.ClickException(str(exc)) from exc
|
||||
_emit_reference_result(resolution.to_dict(), output_format)
|
||||
|
||||
|
||||
@main.command("process")
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--root",
|
||||
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||
default=Path("."),
|
||||
show_default=True,
|
||||
help="Root used for relative processor references.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def process(file: Path, root: Path, output_format: str) -> None:
|
||||
"""Run deterministic fenced-block processors in a Markdown file."""
|
||||
|
||||
document = parse_markdown_file(file)
|
||||
context = ProcessorContext(
|
||||
root=root,
|
||||
current_path=file,
|
||||
namespaces=load_namespaces(document.frontmatter),
|
||||
)
|
||||
result = run_fenced_processors(
|
||||
file.read_text(encoding="utf-8"),
|
||||
context=context,
|
||||
source_path=file,
|
||||
)
|
||||
_emit_processor_run(result.to_dict(), output_format)
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
@main.group("class")
|
||||
def class_group() -> None:
|
||||
"""Resolve deterministic content classes."""
|
||||
|
||||
|
||||
@class_group.command("resolve")
|
||||
@click.argument("class_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.argument("class_name")
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def class_resolve(class_file: Path, class_name: str, output_format: str) -> None:
|
||||
"""Resolve content class inheritance and merged slots."""
|
||||
|
||||
try:
|
||||
registry = load_content_class_file(class_file)
|
||||
result = registry.compose(class_name)
|
||||
except ContentClassResolutionError as exc:
|
||||
raise click.ClickException(str(exc)) from exc
|
||||
_emit_content_class_result(result.to_dict(), output_format)
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--output-dir",
|
||||
type=click.Path(file_okay=False, path_type=Path),
|
||||
help="Write tangled files under this directory. Omit for dry JSON/YAML/text output.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def tangle(file: Path, output_dir: Path | None, output_format: str) -> None:
|
||||
"""Tangle named Markdown code chunks into target files."""
|
||||
|
||||
result = tangle_markdown(file.read_text(encoding="utf-8"), source_path=file)
|
||||
data = result.to_dict()
|
||||
if output_dir and result.valid:
|
||||
data["written_files"] = write_tangle_files(result, output_dir)
|
||||
_emit_tangle_result(data, output_format)
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--output",
|
||||
type=click.Path(dir_okay=False, path_type=Path),
|
||||
help="Write woven Markdown to a file.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
|
||||
default="markdown",
|
||||
show_default=True,
|
||||
)
|
||||
def weave(file: Path, output: Path | None, output_format: str) -> None:
|
||||
"""Weave Markdown documentation with a deterministic chunk index."""
|
||||
|
||||
result = weave_markdown(file.read_text(encoding="utf-8"), source_path=file)
|
||||
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||
|
||||
|
||||
@main.group()
|
||||
def cache() -> None:
|
||||
"""Fingerprint Markdown files and detect changed inputs."""
|
||||
@@ -788,6 +1023,83 @@ def _emit_cache_data(data: dict, output_format: str) -> None:
|
||||
click.echo(f"written: {data['written']}")
|
||||
|
||||
|
||||
def _emit_reference_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
click.echo(f"{data['count']} unit(s)")
|
||||
click.echo(f"target: {data['target_path']}")
|
||||
for unit in data["units"]:
|
||||
span = unit.get("span", {})
|
||||
line = f":{span['line_start']}" if span.get("line_start") else ""
|
||||
click.echo(f"- {unit['kind']} {unit['unit_id']} {unit['source_path']}{line}")
|
||||
if unit.get("name"):
|
||||
click.echo(f" {unit['name']}")
|
||||
|
||||
|
||||
def _emit_explode_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
manifest = data["manifest"]
|
||||
click.echo(f"manifest: {data['manifest_path']}")
|
||||
click.echo(f"variant: {manifest['variant']}")
|
||||
click.echo(f"entries: {len(manifest['entries'])}")
|
||||
for entry in manifest["entries"]:
|
||||
click.echo(f"- {entry['kind']} {entry['file']}")
|
||||
|
||||
|
||||
def _emit_processor_run(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
click.echo("valid" if data["valid"] else "invalid")
|
||||
click.echo(f"processors: {data['count']}")
|
||||
for block, result in zip(data["blocks"], data["results"], strict=False):
|
||||
line = f":{block['line_start']}" if block.get("line_start") else ""
|
||||
click.echo(f"- {block['processor']} {block['unit_id']}{line}")
|
||||
if result.get("content"):
|
||||
click.echo(f" content: {result['content'].splitlines()[0]}")
|
||||
for diagnostic in result.get("diagnostics", []):
|
||||
click.echo(f" [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||
|
||||
|
||||
def _emit_content_class_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
click.echo("valid" if data["valid"] else "invalid")
|
||||
click.echo("linearization: " + " -> ".join(data["linearization"]))
|
||||
for slot, value in data.get("slots", {}).items():
|
||||
click.echo(f"- {slot}: {value}")
|
||||
for diagnostic in data.get("diagnostics", []):
|
||||
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||
|
||||
|
||||
def _emit_tangle_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
click.echo("valid" if data["valid"] else "invalid")
|
||||
click.echo(f"files: {len(data['files'])}")
|
||||
for file in data["files"]:
|
||||
click.echo(f"- {file['path']}: {', '.join(file['chunk_ids'])}")
|
||||
for diagnostic in data.get("diagnostics", []):
|
||||
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||
for written in data.get("written_files", []):
|
||||
click.echo(f"written: {written}")
|
||||
|
||||
|
||||
def _emit_jsonish(data: dict, output_format: str) -> None:
|
||||
if output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
|
||||
19
src/markitect_tool/content_class/__init__.py
Normal file
19
src/markitect_tool/content_class/__init__.py
Normal file
@@ -0,0 +1,19 @@
|
||||
"""Deterministic content class composition."""
|
||||
|
||||
from markitect_tool.content_class.engine import (
|
||||
ClassCompositionResult,
|
||||
ContentClass,
|
||||
ContentClassRegistry,
|
||||
ContentClassResolutionError,
|
||||
load_content_class_file,
|
||||
load_content_classes,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"ClassCompositionResult",
|
||||
"ContentClass",
|
||||
"ContentClassRegistry",
|
||||
"ContentClassResolutionError",
|
||||
"load_content_class_file",
|
||||
"load_content_classes",
|
||||
]
|
||||
225
src/markitect_tool/content_class/engine.py
Normal file
225
src/markitect_tool/content_class/engine.py
Normal file
@@ -0,0 +1,225 @@
|
||||
"""Small deterministic content class resolver."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from copy import deepcopy
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
from markitect_tool.diagnostics import Diagnostic
|
||||
|
||||
|
||||
class ContentClassResolutionError(ValueError):
|
||||
"""Raised when content class definitions cannot be loaded."""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ContentClass:
|
||||
"""A data-defined content class."""
|
||||
|
||||
name: str
|
||||
extends: list[str] = field(default_factory=list)
|
||||
slots: dict[str, Any] = field(default_factory=dict)
|
||||
merge_policies: dict[str, str] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {key: value for key, value in asdict(self).items() if value not in ({}, [], None)}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ClassCompositionResult:
|
||||
"""Resolved content class slots plus diagnostics."""
|
||||
|
||||
class_name: str
|
||||
linearization: list[str]
|
||||
slots: dict[str, Any]
|
||||
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def valid(self) -> bool:
|
||||
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"valid": self.valid,
|
||||
"class_name": self.class_name,
|
||||
"linearization": self.linearization,
|
||||
"slots": self.slots,
|
||||
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||
}
|
||||
|
||||
|
||||
class ContentClassRegistry:
|
||||
"""Registry and resolver for content classes."""
|
||||
|
||||
def __init__(self, classes: dict[str, ContentClass] | None = None) -> None:
|
||||
self.classes = classes or {}
|
||||
|
||||
def add(self, content_class: ContentClass) -> None:
|
||||
self.classes[content_class.name] = content_class
|
||||
|
||||
def linearize(self, class_name: str) -> list[str]:
|
||||
if class_name not in self.classes:
|
||||
raise ContentClassResolutionError(f"Unknown content class `{class_name}`")
|
||||
return self._linearize(class_name, [])
|
||||
|
||||
def compose(self, class_name: str) -> ClassCompositionResult:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
try:
|
||||
linearization = self.linearize(class_name)
|
||||
except ContentClassResolutionError as exc:
|
||||
return ClassCompositionResult(
|
||||
class_name=class_name,
|
||||
linearization=[],
|
||||
slots={},
|
||||
diagnostics=[
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="content_class.resolution_error",
|
||||
message=str(exc),
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
slots: dict[str, Any] = {}
|
||||
for name in reversed(linearization):
|
||||
content_class = self.classes[name]
|
||||
for slot, value in content_class.slots.items():
|
||||
policy = content_class.merge_policies.get(slot, "replace")
|
||||
try:
|
||||
slots[slot] = _merge_slot(slots.get(slot), value, policy)
|
||||
except ContentClassResolutionError as exc:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="content_class.merge_conflict",
|
||||
message=str(exc),
|
||||
details={"class": name, "slot": slot, "policy": policy},
|
||||
)
|
||||
)
|
||||
return ClassCompositionResult(
|
||||
class_name=class_name,
|
||||
linearization=linearization,
|
||||
slots=slots,
|
||||
diagnostics=diagnostics,
|
||||
)
|
||||
|
||||
def _linearize(self, class_name: str, stack: list[str]) -> list[str]:
|
||||
if class_name in stack:
|
||||
raise ContentClassResolutionError(
|
||||
"Cyclic content class inheritance: " + " -> ".join(stack + [class_name])
|
||||
)
|
||||
content_class = self.classes[class_name]
|
||||
parent_mros = [
|
||||
self._linearize(parent, stack + [class_name])
|
||||
for parent in content_class.extends
|
||||
if _known_parent(parent, self.classes)
|
||||
]
|
||||
missing = [parent for parent in content_class.extends if parent not in self.classes]
|
||||
if missing:
|
||||
raise ContentClassResolutionError(
|
||||
f"Content class `{class_name}` extends unknown class(es): {', '.join(missing)}"
|
||||
)
|
||||
return [class_name] + _c3_merge(parent_mros + [list(content_class.extends)])
|
||||
|
||||
|
||||
def load_content_class_file(path: str | Path) -> ContentClassRegistry:
|
||||
"""Load content class definitions from YAML."""
|
||||
|
||||
data = yaml.safe_load(Path(path).read_text(encoding="utf-8"))
|
||||
if not isinstance(data, dict):
|
||||
raise ContentClassResolutionError("Content class file must be a mapping")
|
||||
return load_content_classes(data)
|
||||
|
||||
|
||||
def load_content_classes(data: dict[str, Any]) -> ContentClassRegistry:
|
||||
"""Load content class definitions from a mapping."""
|
||||
|
||||
raw_classes = data.get("classes", data)
|
||||
if not isinstance(raw_classes, dict):
|
||||
raise ContentClassResolutionError("Content classes must be a mapping")
|
||||
classes: dict[str, ContentClass] = {}
|
||||
for name, raw_class in raw_classes.items():
|
||||
if not isinstance(raw_class, dict):
|
||||
raise ContentClassResolutionError(f"Content class `{name}` must be a mapping")
|
||||
extends = raw_class.get("extends", [])
|
||||
if isinstance(extends, str):
|
||||
extends = [extends]
|
||||
if not isinstance(extends, list):
|
||||
raise ContentClassResolutionError(f"Content class `{name}` extends must be a list")
|
||||
slots = raw_class.get("slots", {})
|
||||
policies = raw_class.get("merge_policies", {})
|
||||
if not isinstance(slots, dict) or not isinstance(policies, dict):
|
||||
raise ContentClassResolutionError(
|
||||
f"Content class `{name}` slots and merge_policies must be mappings"
|
||||
)
|
||||
classes[str(name)] = ContentClass(
|
||||
name=str(name),
|
||||
extends=[str(parent) for parent in extends],
|
||||
slots=slots,
|
||||
merge_policies={str(key): str(value) for key, value in policies.items()},
|
||||
)
|
||||
return ContentClassRegistry(classes)
|
||||
|
||||
|
||||
def _c3_merge(sequences: list[list[str]]) -> list[str]:
|
||||
result: list[str] = []
|
||||
sequences = [list(sequence) for sequence in sequences if sequence]
|
||||
while sequences:
|
||||
candidate = None
|
||||
for sequence in sequences:
|
||||
head = sequence[0]
|
||||
if not any(head in other[1:] for other in sequences):
|
||||
candidate = head
|
||||
break
|
||||
if candidate is None:
|
||||
raise ContentClassResolutionError("Inconsistent content class precedence order")
|
||||
result.append(candidate)
|
||||
sequences = [
|
||||
[item for item in sequence if item != candidate]
|
||||
for sequence in sequences
|
||||
]
|
||||
sequences = [sequence for sequence in sequences if sequence]
|
||||
return result
|
||||
|
||||
|
||||
def _merge_slot(existing: Any, value: Any, policy: str) -> Any:
|
||||
incoming = deepcopy(value)
|
||||
if existing is None:
|
||||
return incoming
|
||||
if policy == "replace":
|
||||
return incoming
|
||||
if policy == "append":
|
||||
return _as_list(existing) + _as_list(incoming)
|
||||
if policy == "prepend":
|
||||
return _as_list(incoming) + _as_list(existing)
|
||||
if policy == "deep_merge":
|
||||
if not isinstance(existing, dict) or not isinstance(incoming, dict):
|
||||
raise ContentClassResolutionError("deep_merge requires mapping values")
|
||||
return _deep_merge(existing, incoming)
|
||||
if policy == "error_on_conflict":
|
||||
if existing != incoming:
|
||||
raise ContentClassResolutionError("slot conflict")
|
||||
return existing
|
||||
raise ContentClassResolutionError(f"Unknown merge policy `{policy}`")
|
||||
|
||||
|
||||
def _deep_merge(left: dict[str, Any], right: dict[str, Any]) -> dict[str, Any]:
|
||||
merged = deepcopy(left)
|
||||
for key, value in right.items():
|
||||
if isinstance(merged.get(key), dict) and isinstance(value, dict):
|
||||
merged[key] = _deep_merge(merged[key], value)
|
||||
else:
|
||||
merged[key] = deepcopy(value)
|
||||
return merged
|
||||
|
||||
|
||||
def _as_list(value: Any) -> list[Any]:
|
||||
return value if isinstance(value, list) else [value]
|
||||
|
||||
|
||||
def _known_parent(parent: str, classes: dict[str, ContentClass]) -> bool:
|
||||
return parent in classes
|
||||
25
src/markitect_tool/explode/__init__.py
Normal file
25
src/markitect_tool/explode/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Reversible explode/implode operations for Markdown documents."""
|
||||
|
||||
from markitect_tool.explode.engine import (
|
||||
EXPLODE_MANIFEST_NAME,
|
||||
ExplodeEntry,
|
||||
ExplodeError,
|
||||
ExplodeManifest,
|
||||
ExplodeResult,
|
||||
ImplodeResult,
|
||||
explode_markdown_file,
|
||||
implode_markdown_directory,
|
||||
load_explode_manifest,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"EXPLODE_MANIFEST_NAME",
|
||||
"ExplodeEntry",
|
||||
"ExplodeError",
|
||||
"ExplodeManifest",
|
||||
"ExplodeResult",
|
||||
"ImplodeResult",
|
||||
"explode_markdown_file",
|
||||
"implode_markdown_directory",
|
||||
"load_explode_manifest",
|
||||
]
|
||||
324
src/markitect_tool/explode/engine.py
Normal file
324
src/markitect_tool/explode/engine.py
Normal file
@@ -0,0 +1,324 @@
|
||||
"""Manifest-first reversible explode/implode for Markdown files."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
from markitect_tool.core import Heading, parse_markdown
|
||||
|
||||
|
||||
EXPLODE_MANIFEST_NAME = "markitect-explode.yaml"
|
||||
|
||||
|
||||
class ExplodeError(ValueError):
|
||||
"""Raised when explode or implode cannot preserve a safe roundtrip."""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExplodeEntry:
|
||||
"""One file entry in an exploded Markdown directory."""
|
||||
|
||||
kind: str
|
||||
file: str
|
||||
order: int
|
||||
unit_id: str
|
||||
line_start: int
|
||||
line_end: int
|
||||
heading_level: int | None = None
|
||||
heading_text: str | None = None
|
||||
content_hash: str = ""
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {key: value for key, value in asdict(self).items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExplodeManifest:
|
||||
"""Manifest used to implode an exploded Markdown directory."""
|
||||
|
||||
version: int
|
||||
source_path: str
|
||||
source_hash: str
|
||||
variant: str
|
||||
frontmatter_raw: str = ""
|
||||
entries: list[ExplodeEntry] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"version": self.version,
|
||||
"source_path": self.source_path,
|
||||
"source_hash": self.source_hash,
|
||||
"variant": self.variant,
|
||||
"frontmatter_raw": self.frontmatter_raw,
|
||||
"entries": [entry.to_dict() for entry in self.entries],
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ExplodeResult:
|
||||
"""Result of exploding a Markdown file into a directory."""
|
||||
|
||||
manifest_path: str
|
||||
output_dir: str
|
||||
manifest: ExplodeManifest
|
||||
written_files: list[str]
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"manifest_path": self.manifest_path,
|
||||
"output_dir": self.output_dir,
|
||||
"manifest": self.manifest.to_dict(),
|
||||
"written_files": self.written_files,
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ImplodeResult:
|
||||
"""Result of rebuilding Markdown from an explode manifest."""
|
||||
|
||||
markdown: str
|
||||
manifest_path: str
|
||||
source_hash: str
|
||||
current_hash: str
|
||||
entries: list[str]
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
def explode_markdown_file(
|
||||
path: str | Path,
|
||||
output_dir: str | Path,
|
||||
*,
|
||||
variant: str = "flat",
|
||||
overwrite: bool = False,
|
||||
) -> ExplodeResult:
|
||||
"""Explode a Markdown file into section files plus a roundtrip manifest."""
|
||||
|
||||
if variant not in {"flat", "hierarchical"}:
|
||||
raise ExplodeError("Explode variant must be `flat` or `hierarchical`")
|
||||
|
||||
source_path = Path(path)
|
||||
target_dir = Path(output_dir)
|
||||
markdown = source_path.read_text(encoding="utf-8")
|
||||
if target_dir.exists() and any(target_dir.iterdir()) and not overwrite:
|
||||
raise ExplodeError(f"Output directory is not empty: {target_dir}")
|
||||
target_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
frontmatter_raw, body_start_line = _split_frontmatter_raw(markdown)
|
||||
entries_with_text = _explode_entries(markdown, body_start_line, variant)
|
||||
written_files: list[str] = []
|
||||
entries: list[ExplodeEntry] = []
|
||||
|
||||
for entry, text in entries_with_text:
|
||||
entry_path = _safe_entry_path(target_dir, entry.file)
|
||||
entry_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
entry_path.write_text(text, encoding="utf-8")
|
||||
written_files.append(str(entry_path))
|
||||
entries.append(entry)
|
||||
|
||||
manifest = ExplodeManifest(
|
||||
version=1,
|
||||
source_path=str(source_path),
|
||||
source_hash=_hash_text(markdown),
|
||||
variant=variant,
|
||||
frontmatter_raw=frontmatter_raw,
|
||||
entries=entries,
|
||||
)
|
||||
manifest_path = target_dir / EXPLODE_MANIFEST_NAME
|
||||
manifest_path.write_text(yaml.safe_dump(manifest.to_dict(), sort_keys=False), encoding="utf-8")
|
||||
return ExplodeResult(
|
||||
manifest_path=str(manifest_path),
|
||||
output_dir=str(target_dir),
|
||||
manifest=manifest,
|
||||
written_files=written_files + [str(manifest_path)],
|
||||
)
|
||||
|
||||
|
||||
def implode_markdown_directory(
|
||||
directory: str | Path,
|
||||
*,
|
||||
manifest_path: str | Path | None = None,
|
||||
) -> ImplodeResult:
|
||||
"""Implode a Markdown directory created by :func:`explode_markdown_file`."""
|
||||
|
||||
root = Path(directory)
|
||||
manifest_file = Path(manifest_path) if manifest_path else root / EXPLODE_MANIFEST_NAME
|
||||
manifest = load_explode_manifest(manifest_file)
|
||||
parts = [manifest.frontmatter_raw]
|
||||
entry_files: list[str] = []
|
||||
|
||||
for entry in manifest.entries:
|
||||
entry_path = _safe_entry_path(root, entry.file)
|
||||
if not entry_path.exists() or not entry_path.is_file():
|
||||
raise ExplodeError(f"Exploded entry file not found: {entry.file}")
|
||||
parts.append(entry_path.read_text(encoding="utf-8"))
|
||||
entry_files.append(str(entry_path))
|
||||
|
||||
markdown = "".join(parts)
|
||||
return ImplodeResult(
|
||||
markdown=markdown,
|
||||
manifest_path=str(manifest_file),
|
||||
source_hash=manifest.source_hash,
|
||||
current_hash=_hash_text(markdown),
|
||||
entries=entry_files,
|
||||
)
|
||||
|
||||
|
||||
def load_explode_manifest(path: str | Path) -> ExplodeManifest:
|
||||
"""Load an explode manifest from YAML."""
|
||||
|
||||
manifest_path = Path(path)
|
||||
data = yaml.safe_load(manifest_path.read_text(encoding="utf-8"))
|
||||
if not isinstance(data, dict):
|
||||
raise ExplodeError("Explode manifest must be a mapping")
|
||||
entries = data.get("entries", [])
|
||||
if not isinstance(entries, list):
|
||||
raise ExplodeError("Explode manifest entries must be a list")
|
||||
return ExplodeManifest(
|
||||
version=int(data.get("version", 1)),
|
||||
source_path=str(data.get("source_path", "")),
|
||||
source_hash=str(data.get("source_hash", "")),
|
||||
variant=str(data.get("variant", "flat")),
|
||||
frontmatter_raw=str(data.get("frontmatter_raw", "")),
|
||||
entries=[_entry_from_mapping(entry) for entry in entries],
|
||||
)
|
||||
|
||||
|
||||
def _explode_entries(
|
||||
markdown: str,
|
||||
body_start_line: int,
|
||||
variant: str,
|
||||
) -> list[tuple[ExplodeEntry, str]]:
|
||||
lines = markdown.splitlines(keepends=True)
|
||||
headings = parse_markdown(markdown).headings
|
||||
entries: list[tuple[ExplodeEntry, str]] = []
|
||||
used_ids: dict[str, int] = {}
|
||||
order = 0
|
||||
|
||||
first_heading_line = headings[0].line if headings else len(lines) + 1
|
||||
preamble_text = "".join(lines[body_start_line - 1:first_heading_line - 1])
|
||||
if preamble_text or not headings:
|
||||
entry = ExplodeEntry(
|
||||
kind="preamble",
|
||||
file="00-preamble.md",
|
||||
order=order,
|
||||
unit_id="preamble",
|
||||
line_start=body_start_line,
|
||||
line_end=max(first_heading_line - 1, body_start_line),
|
||||
content_hash=_hash_text(preamble_text),
|
||||
)
|
||||
entries.append((entry, preamble_text))
|
||||
order += 1
|
||||
|
||||
hierarchy: dict[int, str] = {}
|
||||
for index, heading in enumerate(headings):
|
||||
start = heading.line
|
||||
end = headings[index + 1].line - 1 if index + 1 < len(headings) else len(lines)
|
||||
text = "".join(lines[start - 1:end])
|
||||
unit_id = _dedupe_id(_slug(_heading_title(heading)), used_ids)
|
||||
file_path = _entry_file_for_heading(heading, index + 1, unit_id, variant, hierarchy)
|
||||
entry = ExplodeEntry(
|
||||
kind="section",
|
||||
file=file_path,
|
||||
order=order,
|
||||
unit_id=unit_id,
|
||||
line_start=start,
|
||||
line_end=end,
|
||||
heading_level=heading.level,
|
||||
heading_text=heading.text,
|
||||
content_hash=_hash_text(text),
|
||||
)
|
||||
entries.append((entry, text))
|
||||
order += 1
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
def _entry_file_for_heading(
|
||||
heading: Heading,
|
||||
index: int,
|
||||
unit_id: str,
|
||||
variant: str,
|
||||
hierarchy: dict[int, str],
|
||||
) -> str:
|
||||
filename = f"{index:02d}-{unit_id}.md"
|
||||
if variant == "flat":
|
||||
return f"sections/{filename}"
|
||||
|
||||
for level in list(hierarchy):
|
||||
if level >= heading.level:
|
||||
del hierarchy[level]
|
||||
parents = [hierarchy[level] for level in sorted(hierarchy) if level < heading.level]
|
||||
hierarchy[heading.level] = f"{index:02d}-{unit_id}"
|
||||
return str(Path(*parents, filename)) if parents else filename
|
||||
|
||||
|
||||
def _entry_from_mapping(data: Any) -> ExplodeEntry:
|
||||
if not isinstance(data, dict):
|
||||
raise ExplodeError("Explode manifest entry must be a mapping")
|
||||
return ExplodeEntry(
|
||||
kind=str(data["kind"]),
|
||||
file=str(data["file"]),
|
||||
order=int(data["order"]),
|
||||
unit_id=str(data["unit_id"]),
|
||||
line_start=int(data["line_start"]),
|
||||
line_end=int(data["line_end"]),
|
||||
heading_level=int(data["heading_level"]) if data.get("heading_level") is not None else None,
|
||||
heading_text=str(data["heading_text"]) if data.get("heading_text") is not None else None,
|
||||
content_hash=str(data.get("content_hash", "")),
|
||||
)
|
||||
|
||||
|
||||
def _safe_entry_path(root: Path, relative_path: str) -> Path:
|
||||
path = Path(relative_path)
|
||||
if path.is_absolute():
|
||||
raise ExplodeError(f"Exploded entry path must be relative: {relative_path}")
|
||||
resolved = (root / path).resolve()
|
||||
try:
|
||||
resolved.relative_to(root.resolve())
|
||||
except ValueError as exc:
|
||||
raise ExplodeError(f"Exploded entry path escapes directory: {relative_path}") from exc
|
||||
return resolved
|
||||
|
||||
|
||||
def _split_frontmatter_raw(markdown: str) -> tuple[str, int]:
|
||||
if not markdown.startswith("---\n"):
|
||||
return "", 1
|
||||
end = markdown.find("\n---", 4)
|
||||
if end == -1:
|
||||
return "", 1
|
||||
closing_end = markdown.find("\n", end + 4)
|
||||
if closing_end == -1:
|
||||
closing_end = len(markdown)
|
||||
else:
|
||||
closing_end += 1
|
||||
frontmatter_raw = markdown[:closing_end]
|
||||
return frontmatter_raw, frontmatter_raw.count("\n") + 1
|
||||
|
||||
|
||||
def _heading_title(heading: Heading) -> str:
|
||||
text = re.sub(r"\s+\{#[A-Za-z0-9_.:-]+\}\s*$", "", heading.text.strip())
|
||||
return text or "section"
|
||||
|
||||
|
||||
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||
count = used_ids.get(unit_id, 0) + 1
|
||||
used_ids[unit_id] = count
|
||||
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||
|
||||
|
||||
def _slug(value: str) -> str:
|
||||
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||
return slug or "section"
|
||||
|
||||
|
||||
def _hash_text(text: str) -> str:
|
||||
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
23
src/markitect_tool/literate/__init__.py
Normal file
23
src/markitect_tool/literate/__init__.py
Normal file
@@ -0,0 +1,23 @@
|
||||
"""Markdown-native literate weave/tangle workflows."""
|
||||
|
||||
from markitect_tool.literate.engine import (
|
||||
CodeChunk,
|
||||
LiterateFile,
|
||||
TangleResult,
|
||||
WeaveResult,
|
||||
discover_code_chunks,
|
||||
tangle_markdown,
|
||||
weave_markdown,
|
||||
write_tangle_files,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"CodeChunk",
|
||||
"LiterateFile",
|
||||
"TangleResult",
|
||||
"WeaveResult",
|
||||
"discover_code_chunks",
|
||||
"tangle_markdown",
|
||||
"weave_markdown",
|
||||
"write_tangle_files",
|
||||
]
|
||||
317
src/markitect_tool/literate/engine.py
Normal file
317
src/markitect_tool/literate/engine.py
Normal file
@@ -0,0 +1,317 @@
|
||||
"""Literate programming helpers for Markdown fenced code chunks."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
import shlex
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from markdown_it import MarkdownIt
|
||||
|
||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||
from markitect_tool.ops import OperationProvenance
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class CodeChunk:
|
||||
"""A named fenced code chunk."""
|
||||
|
||||
chunk_id: str
|
||||
content: str
|
||||
language: str | None = None
|
||||
target_path: str | None = None
|
||||
references: list[str] = field(default_factory=list)
|
||||
source_path: str | None = None
|
||||
line_start: int | None = None
|
||||
line_end: int | None = None
|
||||
content_hash: str = ""
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {key: value for key, value in asdict(self).items() if value not in (None, [], "")}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class LiterateFile:
|
||||
"""One generated file from tangling."""
|
||||
|
||||
path: str
|
||||
content: str
|
||||
chunk_ids: list[str]
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TangleResult:
|
||||
"""Result of tangling Markdown code chunks."""
|
||||
|
||||
files: list[LiterateFile]
|
||||
chunks: list[CodeChunk]
|
||||
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def valid(self) -> bool:
|
||||
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"valid": self.valid,
|
||||
"files": [file.to_dict() for file in self.files],
|
||||
"chunks": [chunk.to_dict() for chunk in self.chunks],
|
||||
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||
"provenance": [event.to_dict() for event in self.provenance],
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class WeaveResult:
|
||||
"""Result of weaving Markdown documentation with a chunk index."""
|
||||
|
||||
markdown: str
|
||||
chunks: list[CodeChunk]
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"markdown": self.markdown,
|
||||
"chunks": [chunk.to_dict() for chunk in self.chunks],
|
||||
}
|
||||
|
||||
|
||||
_CHUNK_REF_RE = re.compile(r"<<(?P<id>[A-Za-z0-9_.:-]+)>>")
|
||||
_CHUNK_LINE_REF_RE = re.compile(r"^(?P<indent>[ \t]*)<<(?P<id>[A-Za-z0-9_.:-]+)>>[ \t]*$", re.MULTILINE)
|
||||
|
||||
|
||||
def discover_code_chunks(
|
||||
markdown: str,
|
||||
*,
|
||||
source_path: str | Path | None = None,
|
||||
) -> list[CodeChunk]:
|
||||
"""Discover named fenced code chunks in Markdown order."""
|
||||
|
||||
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||
chunks: list[CodeChunk] = []
|
||||
used_ids: dict[str, int] = {}
|
||||
for token in parser.parse(markdown):
|
||||
if token.type != "fence":
|
||||
continue
|
||||
attrs = _parse_fence_info(token.info)
|
||||
chunk_id = attrs.get("id")
|
||||
if not chunk_id:
|
||||
continue
|
||||
chunk_id = _dedupe_id(_slug(chunk_id), used_ids)
|
||||
line_start = token.map[0] + 1 if token.map else None
|
||||
line_end = token.map[1] if token.map else None
|
||||
chunks.append(
|
||||
CodeChunk(
|
||||
chunk_id=chunk_id,
|
||||
content=token.content,
|
||||
language=attrs.get("language"),
|
||||
target_path=attrs.get("tangle") or attrs.get("target"),
|
||||
references=_chunk_references(token.content),
|
||||
source_path=str(source_path) if source_path else None,
|
||||
line_start=line_start,
|
||||
line_end=line_end,
|
||||
content_hash=_hash_text(token.content),
|
||||
)
|
||||
)
|
||||
return chunks
|
||||
|
||||
|
||||
def tangle_markdown(
|
||||
markdown: str,
|
||||
*,
|
||||
source_path: str | Path | None = None,
|
||||
) -> TangleResult:
|
||||
"""Tangle named chunks into target files."""
|
||||
|
||||
chunks = discover_code_chunks(markdown, source_path=source_path)
|
||||
chunks_by_id = {chunk.chunk_id: chunk for chunk in chunks}
|
||||
diagnostics: list[Diagnostic] = []
|
||||
provenance: list[OperationProvenance] = []
|
||||
target_chunks: dict[str, list[CodeChunk]] = {}
|
||||
for chunk in chunks:
|
||||
if chunk.target_path:
|
||||
target_chunks.setdefault(chunk.target_path, []).append(chunk)
|
||||
|
||||
files: list[LiterateFile] = []
|
||||
for target_path, grouped_chunks in target_chunks.items():
|
||||
rendered_parts: list[str] = []
|
||||
for chunk in grouped_chunks:
|
||||
rendered_parts.append(_expand_chunk(chunk, chunks_by_id, diagnostics, []))
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="literate.tangle",
|
||||
source_path=chunk.source_path,
|
||||
line_start=chunk.line_start,
|
||||
line_end=chunk.line_end,
|
||||
target_path=target_path,
|
||||
dependencies=[chunk.source_path] if chunk.source_path else [],
|
||||
metadata={"chunk_id": chunk.chunk_id, "references": chunk.references},
|
||||
)
|
||||
)
|
||||
files.append(
|
||||
LiterateFile(
|
||||
path=target_path,
|
||||
content=_join_tangled_parts(rendered_parts),
|
||||
chunk_ids=[chunk.chunk_id for chunk in grouped_chunks],
|
||||
)
|
||||
)
|
||||
|
||||
return TangleResult(
|
||||
files=files,
|
||||
chunks=chunks,
|
||||
diagnostics=diagnostics,
|
||||
provenance=provenance,
|
||||
)
|
||||
|
||||
|
||||
def weave_markdown(
|
||||
markdown: str,
|
||||
*,
|
||||
source_path: str | Path | None = None,
|
||||
) -> WeaveResult:
|
||||
"""Append a deterministic chunk index to human-readable Markdown."""
|
||||
|
||||
chunks = discover_code_chunks(markdown, source_path=source_path)
|
||||
if not chunks:
|
||||
return WeaveResult(markdown=markdown, chunks=[])
|
||||
|
||||
lines = [markdown.rstrip(), "", "## Code Chunk Index", ""]
|
||||
for chunk in chunks:
|
||||
target = f" -> `{chunk.target_path}`" if chunk.target_path else ""
|
||||
refs = f"; refs: {', '.join(f'`{ref}`' for ref in chunk.references)}" if chunk.references else ""
|
||||
location = f" line {chunk.line_start}" if chunk.line_start else ""
|
||||
lines.append(f"- `{chunk.chunk_id}`{target}{refs}{location}")
|
||||
return WeaveResult(markdown="\n".join(lines).rstrip() + "\n", chunks=chunks)
|
||||
|
||||
|
||||
def write_tangle_files(result: TangleResult, output_dir: str | Path) -> list[str]:
|
||||
"""Write tangled files under an output directory."""
|
||||
|
||||
root = Path(output_dir)
|
||||
root.mkdir(parents=True, exist_ok=True)
|
||||
written: list[str] = []
|
||||
for file in result.files:
|
||||
target = _safe_output_path(root, file.path)
|
||||
target.parent.mkdir(parents=True, exist_ok=True)
|
||||
target.write_text(file.content, encoding="utf-8")
|
||||
written.append(str(target))
|
||||
return written
|
||||
|
||||
|
||||
def _expand_chunk(
|
||||
chunk: CodeChunk,
|
||||
chunks_by_id: dict[str, CodeChunk],
|
||||
diagnostics: list[Diagnostic],
|
||||
stack: list[str],
|
||||
) -> str:
|
||||
if chunk.chunk_id in stack:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="literate.chunk_cycle",
|
||||
message="Cyclic chunk reference: " + " -> ".join(stack + [chunk.chunk_id]),
|
||||
source=SourceLocation(path=chunk.source_path, line=chunk.line_start),
|
||||
)
|
||||
)
|
||||
return f"<<{chunk.chunk_id}>>"
|
||||
|
||||
def replace_line(match: re.Match[str]) -> str:
|
||||
indent = match.group("indent")
|
||||
expanded = _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
|
||||
return "\n".join(f"{indent}{line}" if line else line for line in expanded.splitlines())
|
||||
|
||||
rendered = _CHUNK_LINE_REF_RE.sub(replace_line, chunk.content)
|
||||
|
||||
def replace_inline(match: re.Match[str]) -> str:
|
||||
return _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
|
||||
|
||||
return _CHUNK_REF_RE.sub(replace_inline, rendered)
|
||||
|
||||
|
||||
def _expand_reference(
|
||||
chunk_id: str,
|
||||
chunks_by_id: dict[str, CodeChunk],
|
||||
diagnostics: list[Diagnostic],
|
||||
stack: list[str],
|
||||
source_chunk: CodeChunk,
|
||||
) -> str:
|
||||
referenced = chunks_by_id.get(chunk_id)
|
||||
if not referenced:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="literate.missing_chunk",
|
||||
message=f"Missing chunk reference `{chunk_id}`",
|
||||
source=SourceLocation(path=source_chunk.source_path, line=source_chunk.line_start),
|
||||
)
|
||||
)
|
||||
return f"<<{chunk_id}>>"
|
||||
return _expand_chunk(referenced, chunks_by_id, diagnostics, stack)
|
||||
|
||||
|
||||
def _join_tangled_parts(parts: list[str]) -> str:
|
||||
rendered = "\n".join(part.rstrip("\n") for part in parts if part is not None)
|
||||
return rendered.rstrip() + "\n" if rendered else ""
|
||||
|
||||
|
||||
def _safe_output_path(root: Path, relative_path: str) -> Path:
|
||||
path = Path(relative_path)
|
||||
if path.is_absolute():
|
||||
raise ValueError(f"Tangle target must be relative: {relative_path}")
|
||||
resolved = (root / path).resolve()
|
||||
try:
|
||||
resolved.relative_to(root.resolve())
|
||||
except ValueError as exc:
|
||||
raise ValueError(f"Tangle target escapes output directory: {relative_path}") from exc
|
||||
return resolved
|
||||
|
||||
|
||||
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
|
||||
if not match:
|
||||
return {"language": info.strip()} if info.strip() else {}
|
||||
attrs = _parse_attrs(match.group("attrs") or "")
|
||||
language = match.group("language")
|
||||
if language:
|
||||
attrs["language"] = language
|
||||
return attrs
|
||||
|
||||
|
||||
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||
attrs: dict[str, str] = {}
|
||||
for part in shlex.split(raw):
|
||||
if part.startswith("#") and len(part) > 1:
|
||||
attrs["id"] = part[1:]
|
||||
continue
|
||||
if "=" not in part:
|
||||
attrs[part] = "true"
|
||||
continue
|
||||
key, value = part.split("=", 1)
|
||||
attrs[key.strip()] = value.strip()
|
||||
return attrs
|
||||
|
||||
|
||||
def _chunk_references(content: str) -> list[str]:
|
||||
return [match.group("id") for match in _CHUNK_REF_RE.finditer(content)]
|
||||
|
||||
|
||||
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||
count = used_ids.get(unit_id, 0) + 1
|
||||
used_ids[unit_id] = count
|
||||
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||
|
||||
|
||||
def _slug(value: str) -> str:
|
||||
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||
return slug or "chunk"
|
||||
|
||||
|
||||
def _hash_text(text: str) -> str:
|
||||
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
@@ -4,6 +4,7 @@ from markitect_tool.ops.engine import (
|
||||
ComposeResult,
|
||||
IncludeError,
|
||||
IncludeResult,
|
||||
OperationProvenance,
|
||||
TransformResult,
|
||||
compose_files,
|
||||
resolve_includes,
|
||||
@@ -14,6 +15,7 @@ __all__ = [
|
||||
"ComposeResult",
|
||||
"IncludeError",
|
||||
"IncludeResult",
|
||||
"OperationProvenance",
|
||||
"TransformResult",
|
||||
"compose_files",
|
||||
"resolve_includes",
|
||||
|
||||
@@ -9,6 +9,7 @@ from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
from markdown_it import MarkdownIt
|
||||
|
||||
from markitect_tool.core import parse_markdown
|
||||
from markitect_tool.query import extract_document
|
||||
@@ -18,15 +19,46 @@ class IncludeError(ValueError):
|
||||
"""Raised when include resolution cannot continue."""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class OperationProvenance:
|
||||
"""Structured provenance for deterministic Markdown operations."""
|
||||
|
||||
operation: str
|
||||
source_path: str | None = None
|
||||
line_start: int | None = None
|
||||
line_end: int | None = None
|
||||
target_path: str | None = None
|
||||
dependencies: list[str] = field(default_factory=list)
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"operation": self.operation,
|
||||
"source_path": self.source_path,
|
||||
"line_start": self.line_start,
|
||||
"line_end": self.line_end,
|
||||
"target_path": self.target_path,
|
||||
"dependencies": self.dependencies or None,
|
||||
"metadata": self.metadata or None,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TransformResult:
|
||||
"""Result of a deterministic Markdown transform."""
|
||||
|
||||
markdown: str
|
||||
operations: list[str] = field(default_factory=list)
|
||||
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return asdict(self)
|
||||
data: dict[str, Any] = {
|
||||
"markdown": self.markdown,
|
||||
"operations": self.operations,
|
||||
"provenance": [event.to_dict() for event in self.provenance],
|
||||
}
|
||||
return {key: value for key, value in data.items() if value}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -46,9 +78,15 @@ class IncludeResult:
|
||||
|
||||
markdown: str
|
||||
included_paths: list[str] = field(default_factory=list)
|
||||
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return asdict(self)
|
||||
data: dict[str, Any] = {
|
||||
"markdown": self.markdown,
|
||||
"included_paths": self.included_paths,
|
||||
"provenance": [event.to_dict() for event in self.provenance],
|
||||
}
|
||||
return {key: value for key, value in data.items() if value}
|
||||
|
||||
|
||||
_COMMENT_INCLUDE_RE = re.compile(r"<!--\s*mkt:include\s+(?P<attrs>.*?)\s*-->", re.DOTALL)
|
||||
@@ -68,15 +106,30 @@ def transform_markdown(
|
||||
"""Apply deterministic operations to one Markdown document."""
|
||||
|
||||
operations: list[str] = []
|
||||
provenance: list[OperationProvenance] = []
|
||||
frontmatter, body = _split_frontmatter(markdown)
|
||||
|
||||
if set_frontmatter:
|
||||
frontmatter = _deep_merge(frontmatter, set_frontmatter)
|
||||
operations.append("set_frontmatter")
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="set_frontmatter",
|
||||
source_path=source_path,
|
||||
metadata={"keys": sorted(set_frontmatter.keys())},
|
||||
)
|
||||
)
|
||||
|
||||
if heading_delta:
|
||||
body = shift_heading_levels(body, heading_delta)
|
||||
body, affected_lines = _shift_heading_levels(body, heading_delta)
|
||||
operations.append(f"shift_headings:{heading_delta}")
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="shift_headings",
|
||||
source_path=source_path,
|
||||
metadata={"delta": heading_delta, "affected_lines": affected_lines},
|
||||
)
|
||||
)
|
||||
|
||||
if extract_selector:
|
||||
document_text = _join_frontmatter(frontmatter, body) if frontmatter else body
|
||||
@@ -84,24 +137,71 @@ def transform_markdown(
|
||||
body = "\n\n".join(extract_document(document, extract_selector))
|
||||
frontmatter = {}
|
||||
operations.append(f"extract:{extract_selector}")
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="extract",
|
||||
source_path=source_path,
|
||||
metadata={"selector": extract_selector},
|
||||
)
|
||||
)
|
||||
|
||||
if strip_frontmatter:
|
||||
frontmatter = {}
|
||||
operations.append("strip_frontmatter")
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="strip_frontmatter",
|
||||
source_path=source_path,
|
||||
)
|
||||
)
|
||||
|
||||
return TransformResult(markdown=_join_frontmatter(frontmatter, body), operations=operations)
|
||||
return TransformResult(
|
||||
markdown=_join_frontmatter(frontmatter, body),
|
||||
operations=operations,
|
||||
provenance=provenance,
|
||||
)
|
||||
|
||||
|
||||
def shift_heading_levels(markdown: str, delta: int) -> str:
|
||||
"""Shift ATX heading levels by delta while clamping to levels 1 through 6."""
|
||||
|
||||
def replace(match: re.Match[str]) -> str:
|
||||
shifted, _affected_lines = _shift_heading_levels(markdown, delta)
|
||||
return shifted
|
||||
|
||||
|
||||
def _shift_heading_levels(markdown: str, delta: int) -> tuple[str, list[int]]:
|
||||
ignored_lines = _code_line_numbers(markdown)
|
||||
affected_lines: list[int] = []
|
||||
rendered_lines: list[str] = []
|
||||
|
||||
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
|
||||
if line_number in ignored_lines:
|
||||
rendered_lines.append(line)
|
||||
continue
|
||||
line_body = line.rstrip("\r\n")
|
||||
line_ending = line[len(line_body) :]
|
||||
match = _HEADING_RE.match(line_body)
|
||||
if not match:
|
||||
rendered_lines.append(line)
|
||||
continue
|
||||
marks = match.group(1)
|
||||
suffix = match.group(2)
|
||||
level = min(max(len(marks) + delta, 1), 6)
|
||||
return f"{'#' * level}{suffix}"
|
||||
rendered_lines.append(f"{'#' * level}{suffix}{line_ending}")
|
||||
affected_lines.append(line_number)
|
||||
|
||||
return _HEADING_RE.sub(replace, markdown)
|
||||
return "".join(rendered_lines), affected_lines
|
||||
|
||||
|
||||
def _code_line_numbers(markdown: str) -> set[int]:
|
||||
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||
ignored_lines: set[int] = set()
|
||||
for token in parser.parse(markdown):
|
||||
if token.type not in {"fence", "code_block"} or not token.map:
|
||||
continue
|
||||
start, end = token.map
|
||||
ignored_lines.update(range(start + 1, end + 1))
|
||||
return ignored_lines
|
||||
|
||||
|
||||
def compose_files(
|
||||
@@ -154,18 +254,22 @@ def resolve_includes(
|
||||
root = Path(base_dir).resolve()
|
||||
stack = [Path(current_path).resolve()] if current_path else []
|
||||
included: list[Path] = []
|
||||
provenance: list[OperationProvenance] = []
|
||||
resolved = _resolve_include_text(
|
||||
markdown,
|
||||
root=root,
|
||||
current_dir=Path(current_path).resolve().parent if current_path else root,
|
||||
source_path=Path(current_path).resolve() if current_path else None,
|
||||
stack=stack,
|
||||
included=included,
|
||||
provenance=provenance,
|
||||
depth=0,
|
||||
max_depth=max_depth,
|
||||
)
|
||||
return IncludeResult(
|
||||
markdown=resolved,
|
||||
included_paths=[str(path) for path in included],
|
||||
provenance=provenance,
|
||||
)
|
||||
|
||||
|
||||
@@ -174,34 +278,73 @@ def _resolve_include_text(
|
||||
*,
|
||||
root: Path,
|
||||
current_dir: Path,
|
||||
source_path: Path | None,
|
||||
stack: list[Path],
|
||||
included: list[Path],
|
||||
provenance: list[OperationProvenance],
|
||||
depth: int,
|
||||
max_depth: int,
|
||||
) -> str:
|
||||
if depth > max_depth:
|
||||
raise IncludeError(f"Include depth exceeded max_depth={max_depth}")
|
||||
|
||||
def replace_comment(match: re.Match[str]) -> str:
|
||||
attrs = _parse_include_attrs(match.group("attrs"))
|
||||
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
|
||||
ignored_lines = _code_line_numbers(markdown)
|
||||
rendered_lines: list[str] = []
|
||||
|
||||
def replace_brace(match: re.Match[str]) -> str:
|
||||
attrs = {"path": match.group("path").strip()}
|
||||
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
|
||||
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
|
||||
if line_number in ignored_lines:
|
||||
rendered_lines.append(line)
|
||||
continue
|
||||
|
||||
markdown = _COMMENT_INCLUDE_RE.sub(replace_comment, markdown)
|
||||
return _BRACE_INCLUDE_RE.sub(replace_brace, markdown)
|
||||
def replace_comment(match: re.Match[str]) -> str:
|
||||
attrs = _parse_include_attrs(match.group("attrs"))
|
||||
return _render_include(
|
||||
attrs,
|
||||
root,
|
||||
current_dir,
|
||||
source_path,
|
||||
stack,
|
||||
included,
|
||||
provenance,
|
||||
depth,
|
||||
max_depth,
|
||||
marker_line=line_number,
|
||||
)
|
||||
|
||||
def replace_brace(match: re.Match[str]) -> str:
|
||||
attrs = {"path": match.group("path").strip()}
|
||||
return _render_include(
|
||||
attrs,
|
||||
root,
|
||||
current_dir,
|
||||
source_path,
|
||||
stack,
|
||||
included,
|
||||
provenance,
|
||||
depth,
|
||||
max_depth,
|
||||
marker_line=line_number,
|
||||
)
|
||||
|
||||
line = _COMMENT_INCLUDE_RE.sub(replace_comment, line)
|
||||
line = _BRACE_INCLUDE_RE.sub(replace_brace, line)
|
||||
rendered_lines.append(line)
|
||||
|
||||
return "".join(rendered_lines)
|
||||
|
||||
|
||||
def _render_include(
|
||||
attrs: dict[str, str],
|
||||
root: Path,
|
||||
current_dir: Path,
|
||||
source_path: Path | None,
|
||||
stack: list[Path],
|
||||
included: list[Path],
|
||||
provenance: list[OperationProvenance],
|
||||
depth: int,
|
||||
max_depth: int,
|
||||
*,
|
||||
marker_line: int,
|
||||
) -> str:
|
||||
raw_path = attrs.get("path")
|
||||
if not raw_path:
|
||||
@@ -228,12 +371,33 @@ def _render_include(
|
||||
body = shift_heading_levels(body, heading_delta)
|
||||
|
||||
included.append(include_path)
|
||||
provenance.append(
|
||||
OperationProvenance(
|
||||
operation="include",
|
||||
source_path=str(source_path) if source_path else None,
|
||||
line_start=marker_line,
|
||||
line_end=marker_line,
|
||||
target_path=str(include_path),
|
||||
dependencies=[str(include_path)],
|
||||
metadata={
|
||||
key: value
|
||||
for key, value in {
|
||||
"selector": selector,
|
||||
"heading_delta": heading_delta if heading_delta else None,
|
||||
"include_frontmatter": attrs.get("include_frontmatter"),
|
||||
}.items()
|
||||
if value is not None
|
||||
},
|
||||
)
|
||||
)
|
||||
return _resolve_include_text(
|
||||
body.strip(),
|
||||
root=root,
|
||||
current_dir=include_path.parent,
|
||||
source_path=include_path,
|
||||
stack=stack + [include_path],
|
||||
included=included,
|
||||
provenance=provenance,
|
||||
depth=depth + 1,
|
||||
max_depth=max_depth,
|
||||
)
|
||||
|
||||
27
src/markitect_tool/processor/__init__.py
Normal file
27
src/markitect_tool/processor/__init__.py
Normal file
@@ -0,0 +1,27 @@
|
||||
"""Deterministic fenced-block processor registry."""
|
||||
|
||||
from markitect_tool.processor.engine import (
|
||||
FencedProcessorBlock,
|
||||
ProcessorContext,
|
||||
ProcessorOutputFile,
|
||||
ProcessorRegistry,
|
||||
ProcessorRequest,
|
||||
ProcessorResult,
|
||||
ProcessorRun,
|
||||
default_processor_registry,
|
||||
discover_fenced_processors,
|
||||
run_fenced_processors,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"FencedProcessorBlock",
|
||||
"ProcessorContext",
|
||||
"ProcessorOutputFile",
|
||||
"ProcessorRegistry",
|
||||
"ProcessorRequest",
|
||||
"ProcessorResult",
|
||||
"ProcessorRun",
|
||||
"default_processor_registry",
|
||||
"discover_fenced_processors",
|
||||
"run_fenced_processors",
|
||||
]
|
||||
374
src/markitect_tool/processor/engine.py
Normal file
374
src/markitect_tool/processor/engine.py
Normal file
@@ -0,0 +1,374 @@
|
||||
"""Processor API for deterministic fenced-block workflows."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
import shlex
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable
|
||||
|
||||
from markdown_it import MarkdownIt
|
||||
|
||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||
from markitect_tool.ops import OperationProvenance
|
||||
from markitect_tool.reference import (
|
||||
ReferenceContext,
|
||||
ReferenceResolutionError,
|
||||
resolve_reference,
|
||||
)
|
||||
|
||||
|
||||
ProcessorCallable = Callable[["ProcessorRequest"], "ProcessorResult"]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FencedProcessorBlock:
|
||||
"""A fenced Markdown block that opted into processor handling."""
|
||||
|
||||
processor: str
|
||||
content: str
|
||||
unit_id: str
|
||||
attrs: dict[str, str]
|
||||
language: str | None = None
|
||||
source_path: str | None = None
|
||||
line_start: int | None = None
|
||||
line_end: int | None = None
|
||||
content_hash: str = ""
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {key: value for key, value in asdict(self).items() if value not in (None, {}, "")}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProcessorContext:
|
||||
"""Execution context passed to deterministic processors."""
|
||||
|
||||
root: Path = Path(".")
|
||||
current_path: Path | None = None
|
||||
namespaces: dict[str, str] = field(default_factory=dict)
|
||||
variables: dict[str, Any] = field(default_factory=dict)
|
||||
policy: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def reference_context(self) -> ReferenceContext:
|
||||
return ReferenceContext(
|
||||
root=self.root,
|
||||
current_path=self.current_path,
|
||||
namespaces=self.namespaces,
|
||||
)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"root": str(self.root),
|
||||
"current_path": str(self.current_path) if self.current_path else None,
|
||||
"namespaces": self.namespaces,
|
||||
"variables": self.variables,
|
||||
"policy": self.policy,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value not in (None, {}, "")}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProcessorRequest:
|
||||
"""One processor invocation."""
|
||||
|
||||
block: FencedProcessorBlock
|
||||
context: ProcessorContext
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProcessorOutputFile:
|
||||
"""A generated file requested by a processor."""
|
||||
|
||||
path: str
|
||||
content: str
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return asdict(self)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProcessorResult:
|
||||
"""Deterministic processor result envelope."""
|
||||
|
||||
content: str | None = None
|
||||
files: list[ProcessorOutputFile] = field(default_factory=list)
|
||||
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||
dependencies: list[str] = field(default_factory=list)
|
||||
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def valid(self) -> bool:
|
||||
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"valid": self.valid,
|
||||
"content": self.content,
|
||||
"files": [file.to_dict() for file in self.files],
|
||||
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||
"dependencies": self.dependencies,
|
||||
"provenance": [event.to_dict() for event in self.provenance],
|
||||
}
|
||||
return {key: value for key, value in data.items() if value not in (None, [], {})}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProcessorRun:
|
||||
"""Results from running all processor blocks in a document."""
|
||||
|
||||
source_path: str | None
|
||||
blocks: list[FencedProcessorBlock]
|
||||
results: list[ProcessorResult]
|
||||
|
||||
@property
|
||||
def valid(self) -> bool:
|
||||
return all(result.valid for result in self.results)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"valid": self.valid,
|
||||
"source_path": self.source_path,
|
||||
"count": len(self.results),
|
||||
"blocks": [block.to_dict() for block in self.blocks],
|
||||
"results": [result.to_dict() for result in self.results],
|
||||
}
|
||||
|
||||
|
||||
class ProcessorRegistry:
|
||||
"""Explicit registry for deterministic fenced-block processors."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._processors: dict[str, ProcessorCallable] = {}
|
||||
|
||||
def register(self, name: str, processor: ProcessorCallable) -> None:
|
||||
key = _slug(name)
|
||||
if not key:
|
||||
raise ValueError("Processor name cannot be empty")
|
||||
self._processors[key] = processor
|
||||
|
||||
def names(self) -> list[str]:
|
||||
return sorted(self._processors)
|
||||
|
||||
def run(self, request: ProcessorRequest) -> ProcessorResult:
|
||||
processor = self._processors.get(_slug(request.block.processor))
|
||||
if processor is None:
|
||||
return ProcessorResult(
|
||||
diagnostics=[
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="processor.unknown",
|
||||
message=f"Unknown processor `{request.block.processor}`",
|
||||
source=SourceLocation(
|
||||
path=request.block.source_path,
|
||||
line=request.block.line_start,
|
||||
),
|
||||
)
|
||||
]
|
||||
)
|
||||
return processor(request)
|
||||
|
||||
|
||||
def default_processor_registry() -> ProcessorRegistry:
|
||||
"""Create the default deterministic processor registry."""
|
||||
|
||||
registry = ProcessorRegistry()
|
||||
registry.register("identity", _identity_processor)
|
||||
registry.register("uppercase", _uppercase_processor)
|
||||
registry.register("include", _include_processor)
|
||||
return registry
|
||||
|
||||
|
||||
def discover_fenced_processors(
|
||||
markdown: str,
|
||||
*,
|
||||
source_path: str | Path | None = None,
|
||||
) -> list[FencedProcessorBlock]:
|
||||
"""Discover fenced blocks that explicitly opt into processor handling."""
|
||||
|
||||
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||
blocks: list[FencedProcessorBlock] = []
|
||||
used_ids: dict[str, int] = {}
|
||||
for index, token in enumerate(parser.parse(markdown)):
|
||||
if token.type != "fence":
|
||||
continue
|
||||
attrs = _parse_fence_info(token.info)
|
||||
processor = _processor_name(attrs)
|
||||
if not processor:
|
||||
continue
|
||||
unit_id = _dedupe_id(_slug(attrs.get("id") or f"{processor}-{index}"), used_ids)
|
||||
line_start = token.map[0] + 1 if token.map else None
|
||||
line_end = token.map[1] if token.map else None
|
||||
blocks.append(
|
||||
FencedProcessorBlock(
|
||||
processor=processor,
|
||||
content=token.content,
|
||||
unit_id=unit_id,
|
||||
attrs={
|
||||
key: value
|
||||
for key, value in attrs.items()
|
||||
if key not in {"id", "language", "processor"}
|
||||
},
|
||||
language=attrs.get("language"),
|
||||
source_path=str(source_path) if source_path else None,
|
||||
line_start=line_start,
|
||||
line_end=line_end,
|
||||
content_hash=_hash_text(token.content),
|
||||
)
|
||||
)
|
||||
return blocks
|
||||
|
||||
|
||||
def run_fenced_processors(
|
||||
markdown: str,
|
||||
*,
|
||||
context: ProcessorContext,
|
||||
registry: ProcessorRegistry | None = None,
|
||||
source_path: str | Path | None = None,
|
||||
) -> ProcessorRun:
|
||||
"""Run all processor-marked fenced blocks in document order."""
|
||||
|
||||
active_registry = registry or default_processor_registry()
|
||||
blocks = discover_fenced_processors(markdown, source_path=source_path or context.current_path)
|
||||
results = [
|
||||
active_registry.run(ProcessorRequest(block=block, context=context))
|
||||
for block in blocks
|
||||
]
|
||||
return ProcessorRun(
|
||||
source_path=str(source_path or context.current_path) if source_path or context.current_path else None,
|
||||
blocks=blocks,
|
||||
results=results,
|
||||
)
|
||||
|
||||
|
||||
def _identity_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||
return ProcessorResult(
|
||||
content=request.block.content,
|
||||
provenance=[
|
||||
OperationProvenance(
|
||||
operation="processor.identity",
|
||||
source_path=request.block.source_path,
|
||||
line_start=request.block.line_start,
|
||||
line_end=request.block.line_end,
|
||||
metadata={"unit_id": request.block.unit_id},
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def _uppercase_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||
return ProcessorResult(
|
||||
content=request.block.content.upper(),
|
||||
provenance=[
|
||||
OperationProvenance(
|
||||
operation="processor.uppercase",
|
||||
source_path=request.block.source_path,
|
||||
line_start=request.block.line_start,
|
||||
line_end=request.block.line_end,
|
||||
metadata={"unit_id": request.block.unit_id},
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def _include_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||
reference = request.block.attrs.get("ref")
|
||||
if not reference:
|
||||
return ProcessorResult(
|
||||
diagnostics=[
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="processor.include.missing_ref",
|
||||
message="Include processor requires a `ref` attribute",
|
||||
source=SourceLocation(
|
||||
path=request.block.source_path,
|
||||
line=request.block.line_start,
|
||||
),
|
||||
)
|
||||
]
|
||||
)
|
||||
try:
|
||||
resolution = resolve_reference(reference, context=request.context.reference_context())
|
||||
except ReferenceResolutionError as exc:
|
||||
return ProcessorResult(
|
||||
diagnostics=[
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="processor.include.reference_error",
|
||||
message=str(exc),
|
||||
source=SourceLocation(
|
||||
path=request.block.source_path,
|
||||
line=request.block.line_start,
|
||||
),
|
||||
)
|
||||
]
|
||||
)
|
||||
content = "\n\n".join(unit.text for unit in resolution.units)
|
||||
return ProcessorResult(
|
||||
content=content,
|
||||
dependencies=[resolution.target_path],
|
||||
provenance=[
|
||||
OperationProvenance(
|
||||
operation="processor.include",
|
||||
source_path=request.block.source_path,
|
||||
line_start=request.block.line_start,
|
||||
line_end=request.block.line_end,
|
||||
target_path=resolution.target_path,
|
||||
dependencies=[resolution.target_path],
|
||||
metadata={"ref": reference, "unit_ids": [unit.unit_id for unit in resolution.units]},
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def _processor_name(attrs: dict[str, str]) -> str | None:
|
||||
if "processor" in attrs:
|
||||
return attrs["processor"]
|
||||
language = attrs.get("language", "")
|
||||
if language.startswith("mkt-"):
|
||||
return language.removeprefix("mkt-")
|
||||
if language == "mkt" and "type" in attrs:
|
||||
return attrs["type"]
|
||||
return None
|
||||
|
||||
|
||||
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
|
||||
if not match:
|
||||
return {"language": info.strip()} if info.strip() else {}
|
||||
attrs = _parse_attrs(match.group("attrs") or "")
|
||||
language = match.group("language")
|
||||
if language:
|
||||
attrs["language"] = language
|
||||
return attrs
|
||||
|
||||
|
||||
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||
attrs: dict[str, str] = {}
|
||||
for part in shlex.split(raw):
|
||||
if part.startswith("#") and len(part) > 1:
|
||||
attrs["id"] = part[1:]
|
||||
continue
|
||||
if "=" not in part:
|
||||
attrs[part] = "true"
|
||||
continue
|
||||
key, value = part.split("=", 1)
|
||||
attrs[key.strip()] = value.strip()
|
||||
return attrs
|
||||
|
||||
|
||||
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||
count = used_ids.get(unit_id, 0) + 1
|
||||
used_ids[unit_id] = count
|
||||
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||
|
||||
|
||||
def _slug(value: str) -> str:
|
||||
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||
return slug
|
||||
|
||||
|
||||
def _hash_text(text: str) -> str:
|
||||
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||
25
src/markitect_tool/reference/__init__.py
Normal file
25
src/markitect_tool/reference/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Namespaced content reference resolution for Markdown artifacts."""
|
||||
|
||||
from markitect_tool.reference.engine import (
|
||||
ContentUnit,
|
||||
ReferenceAddress,
|
||||
ReferenceContext,
|
||||
ReferenceResolution,
|
||||
ReferenceResolutionError,
|
||||
SourceSpan,
|
||||
load_namespaces,
|
||||
parse_reference,
|
||||
resolve_reference,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"ContentUnit",
|
||||
"ReferenceAddress",
|
||||
"ReferenceContext",
|
||||
"ReferenceResolution",
|
||||
"ReferenceResolutionError",
|
||||
"SourceSpan",
|
||||
"load_namespaces",
|
||||
"parse_reference",
|
||||
"resolve_reference",
|
||||
]
|
||||
626
src/markitect_tool/reference/engine.py
Normal file
626
src/markitect_tool/reference/engine.py
Normal file
@@ -0,0 +1,626 @@
|
||||
"""Reference parsing and resolution for Markdown content units."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
import shlex
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from markdown_it import MarkdownIt
|
||||
|
||||
from markitect_tool.core import ContentBlock, Document, Heading, Section, parse_markdown
|
||||
from markitect_tool.query import InvalidQueryError, QueryMatch, query_document
|
||||
|
||||
|
||||
class ReferenceResolutionError(ValueError):
|
||||
"""Raised when a content reference cannot be resolved."""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReferenceAddress:
|
||||
"""Parsed content reference address.
|
||||
|
||||
Syntax is intentionally compact and Markdown-friendly:
|
||||
|
||||
- ``path/to/file.md``
|
||||
- ``std:clauses/payment.md``
|
||||
- ``std:clauses/payment.md#section:terms``
|
||||
- ``std:clauses/payment.md::sections[heading=Terms]``
|
||||
- ``#intro`` for a fragment in the current document
|
||||
"""
|
||||
|
||||
raw: str
|
||||
namespace: str | None = None
|
||||
address: str = ""
|
||||
fragment: str | None = None
|
||||
selector: str | None = None
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
key: value
|
||||
for key, value in asdict(self).items()
|
||||
if value is not None and value != ""
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReferenceContext:
|
||||
"""Inputs used to resolve namespaced and relative content references."""
|
||||
|
||||
root: Path = Path(".")
|
||||
current_path: Path | None = None
|
||||
namespaces: dict[str, str] = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_document(
|
||||
cls,
|
||||
document: Document,
|
||||
*,
|
||||
root: str | Path = ".",
|
||||
current_path: str | Path | None = None,
|
||||
) -> "ReferenceContext":
|
||||
"""Build a reference context from document frontmatter."""
|
||||
|
||||
source_path = current_path or document.source_path
|
||||
return cls(
|
||||
root=Path(root),
|
||||
current_path=Path(source_path) if source_path else None,
|
||||
namespaces=load_namespaces(document.frontmatter),
|
||||
)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"root": str(self.root),
|
||||
"current_path": str(self.current_path) if self.current_path else None,
|
||||
"namespaces": self.namespaces,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SourceSpan:
|
||||
"""Line span for a resolved unit in its source file."""
|
||||
|
||||
line_start: int | None = None
|
||||
line_end: int | None = None
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {key: value for key, value in asdict(self).items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ContentUnit:
|
||||
"""One addressable content unit resolved from Markdown."""
|
||||
|
||||
kind: str
|
||||
unit_id: str
|
||||
text: str
|
||||
source_path: str
|
||||
span: SourceSpan | None = None
|
||||
name: str | None = None
|
||||
content_hash: str = ""
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"kind": self.kind,
|
||||
"unit_id": self.unit_id,
|
||||
"name": self.name,
|
||||
"source_path": self.source_path,
|
||||
"span": self.span.to_dict() if self.span else None,
|
||||
"content_hash": self.content_hash,
|
||||
"metadata": self.metadata or None,
|
||||
"text": self.text,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReferenceResolution:
|
||||
"""Resolved content reference and its dependency edge."""
|
||||
|
||||
reference: ReferenceAddress
|
||||
source_path: str
|
||||
target_path: str
|
||||
units: list[ContentUnit]
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"reference": self.reference.to_dict(),
|
||||
"source_path": self.source_path,
|
||||
"target_path": self.target_path,
|
||||
"count": len(self.units),
|
||||
"units": [unit.to_dict() for unit in self.units],
|
||||
}
|
||||
|
||||
|
||||
_NAMESPACE_RE = re.compile(r"^(?P<namespace>[A-Za-z][A-Za-z0-9_.-]*):(?P<address>.*)$")
|
||||
_HEADING_ID_RE = re.compile(r"^(?P<title>.*?)(?:\s+\{#(?P<id>[A-Za-z0-9_.:-]+)\})?$")
|
||||
_REGION_OPEN_RE = re.compile(r"<!--\s*mkt:region\s+(?P<attrs>.*?)\s*-->")
|
||||
_REGION_CLOSE_RE = re.compile(r"<!--\s*/mkt:region\s*-->")
|
||||
_FENCE_ATTRS_RE = re.compile(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$")
|
||||
|
||||
|
||||
def parse_reference(reference: str) -> ReferenceAddress:
|
||||
"""Parse a compact Markitect content reference."""
|
||||
|
||||
raw = reference.strip()
|
||||
if not raw:
|
||||
raise ReferenceResolutionError("Reference cannot be empty")
|
||||
|
||||
selector: str | None = None
|
||||
base = raw
|
||||
if "::" in base:
|
||||
base, selector = base.split("::", 1)
|
||||
selector = selector.strip()
|
||||
if not selector:
|
||||
raise ReferenceResolutionError(f"Reference selector is empty in `{reference}`")
|
||||
|
||||
fragment: str | None = None
|
||||
if "#" in base:
|
||||
base, fragment = base.split("#", 1)
|
||||
fragment = fragment.strip()
|
||||
if not fragment:
|
||||
raise ReferenceResolutionError(f"Reference fragment is empty in `{reference}`")
|
||||
|
||||
namespace: str | None = None
|
||||
address = base.strip()
|
||||
match = _NAMESPACE_RE.match(address)
|
||||
if match and "/" not in match.group("namespace") and "\\" not in match.group("namespace"):
|
||||
namespace = match.group("namespace")
|
||||
address = match.group("address").strip()
|
||||
|
||||
return ReferenceAddress(
|
||||
raw=raw,
|
||||
namespace=namespace,
|
||||
address=address,
|
||||
fragment=fragment,
|
||||
selector=selector,
|
||||
)
|
||||
|
||||
|
||||
def load_namespaces(frontmatter: dict[str, Any]) -> dict[str, str]:
|
||||
"""Load namespace mappings from Markdown frontmatter."""
|
||||
|
||||
raw_namespaces = frontmatter.get("namespaces", {})
|
||||
if raw_namespaces is None:
|
||||
return {}
|
||||
if not isinstance(raw_namespaces, dict):
|
||||
raise ReferenceResolutionError("Frontmatter `namespaces` must be a mapping")
|
||||
|
||||
namespaces: dict[str, str] = {}
|
||||
for raw_key, raw_value in raw_namespaces.items():
|
||||
key = str(raw_key).strip().rstrip(":")
|
||||
if not key:
|
||||
raise ReferenceResolutionError("Namespace keys cannot be empty")
|
||||
if not _NAMESPACE_RE.match(f"{key}:"):
|
||||
raise ReferenceResolutionError(f"Invalid namespace key `{raw_key}`")
|
||||
if not isinstance(raw_value, str):
|
||||
raise ReferenceResolutionError(f"Namespace `{key}` must map to a string path")
|
||||
value = raw_value.strip()
|
||||
if not value:
|
||||
raise ReferenceResolutionError(f"Namespace `{key}` cannot map to an empty path")
|
||||
namespaces[key] = value
|
||||
return namespaces
|
||||
|
||||
|
||||
def resolve_reference(
|
||||
reference: str | ReferenceAddress,
|
||||
*,
|
||||
context: ReferenceContext,
|
||||
) -> ReferenceResolution:
|
||||
"""Resolve a content reference to one or more content units."""
|
||||
|
||||
address = parse_reference(reference) if isinstance(reference, str) else reference
|
||||
root = context.root.resolve()
|
||||
source_path = context.current_path.resolve() if context.current_path else root
|
||||
target_path = _resolve_target_path(address, context, root, source_path)
|
||||
if not target_path.exists() or not target_path.is_file():
|
||||
raise ReferenceResolutionError(f"Referenced file not found: {target_path}")
|
||||
|
||||
markdown = target_path.read_text(encoding="utf-8")
|
||||
document = parse_markdown(markdown, source_path=str(target_path))
|
||||
|
||||
if address.selector and address.fragment:
|
||||
raise ReferenceResolutionError("Reference cannot use both fragment and selector")
|
||||
if address.selector:
|
||||
units = _units_from_selector(document, address.selector, target_path)
|
||||
elif address.fragment:
|
||||
units = _units_from_fragment(document, address.fragment, target_path, markdown)
|
||||
else:
|
||||
units = [_document_unit(document, target_path, markdown)]
|
||||
|
||||
if not units:
|
||||
raise ReferenceResolutionError(f"Reference `{address.raw}` did not match any content units")
|
||||
|
||||
return ReferenceResolution(
|
||||
reference=address,
|
||||
source_path=str(source_path),
|
||||
target_path=str(target_path),
|
||||
units=units,
|
||||
)
|
||||
|
||||
|
||||
def _resolve_target_path(
|
||||
address: ReferenceAddress,
|
||||
context: ReferenceContext,
|
||||
root: Path,
|
||||
source_path: Path,
|
||||
) -> Path:
|
||||
if address.namespace:
|
||||
if address.namespace not in context.namespaces:
|
||||
raise ReferenceResolutionError(f"Unknown namespace `{address.namespace}`")
|
||||
namespace_target = _path_from_namespace(context.namespaces[address.namespace], root)
|
||||
candidate = namespace_target / address.address if namespace_target.is_dir() else namespace_target
|
||||
elif address.address:
|
||||
base_dir = source_path.parent if source_path.is_file() else root
|
||||
candidate = Path(address.address)
|
||||
candidate = candidate if candidate.is_absolute() else base_dir / candidate
|
||||
elif context.current_path:
|
||||
candidate = context.current_path
|
||||
else:
|
||||
raise ReferenceResolutionError("Pathless references require a current document")
|
||||
|
||||
resolved = candidate.resolve()
|
||||
try:
|
||||
resolved.relative_to(root)
|
||||
except ValueError as exc:
|
||||
raise ReferenceResolutionError(f"Reference escapes root: {address.raw}") from exc
|
||||
return resolved
|
||||
|
||||
|
||||
def _path_from_namespace(raw_path: str, root: Path) -> Path:
|
||||
path = Path(raw_path)
|
||||
if not path.is_absolute():
|
||||
path = root / path
|
||||
return path.resolve()
|
||||
|
||||
|
||||
def _units_from_selector(
|
||||
document: Document,
|
||||
selector: str,
|
||||
target_path: Path,
|
||||
) -> list[ContentUnit]:
|
||||
try:
|
||||
matches = query_document(document, selector)
|
||||
except InvalidQueryError as exc:
|
||||
raise ReferenceResolutionError(str(exc)) from exc
|
||||
return [_unit_from_query_match(match, target_path) for match in matches]
|
||||
|
||||
|
||||
def _units_from_fragment(
|
||||
document: Document,
|
||||
fragment: str,
|
||||
target_path: Path,
|
||||
markdown: str,
|
||||
) -> list[ContentUnit]:
|
||||
kind, _, value = fragment.partition(":")
|
||||
if not value:
|
||||
kind, value = "id", kind
|
||||
lookup = _slug(value)
|
||||
|
||||
if kind == "document":
|
||||
return [_document_unit(document, target_path, markdown)]
|
||||
if kind == "id":
|
||||
for units in [
|
||||
_section_units(document, target_path),
|
||||
_region_units(markdown, target_path),
|
||||
_fenced_block_units(markdown, target_path),
|
||||
_heading_units(document, target_path),
|
||||
]:
|
||||
matches = [
|
||||
unit for unit in units if unit.unit_id == lookup or _slug(unit.name or "") == lookup
|
||||
]
|
||||
if matches:
|
||||
return matches
|
||||
return []
|
||||
if kind in {"id", "section"}:
|
||||
sections = _section_units(document, target_path)
|
||||
return [unit for unit in sections if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
|
||||
if kind == "heading":
|
||||
headings = _heading_units(document, target_path)
|
||||
return [unit for unit in headings if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
|
||||
if kind == "block":
|
||||
return _block_fragment_units(document, target_path, value)
|
||||
if kind == "region":
|
||||
return [unit for unit in _region_units(markdown, target_path) if unit.unit_id == lookup]
|
||||
if kind == "fence":
|
||||
return [unit for unit in _fenced_block_units(markdown, target_path) if unit.unit_id == lookup]
|
||||
if kind == "tag":
|
||||
return [
|
||||
unit
|
||||
for unit in _region_units(markdown, target_path) + _fenced_block_units(markdown, target_path)
|
||||
if lookup in {_slug(tag) for tag in unit.metadata.get("tags", [])}
|
||||
]
|
||||
if kind == "line":
|
||||
return _line_range_units(markdown, target_path, value)
|
||||
raise ReferenceResolutionError(f"Unsupported reference fragment kind `{kind}`")
|
||||
|
||||
|
||||
def _document_unit(document: Document, target_path: Path, markdown: str) -> ContentUnit:
|
||||
unit_id = _slug(str(document.frontmatter.get("id") or target_path.stem))
|
||||
return _content_unit(
|
||||
kind="document",
|
||||
unit_id=unit_id,
|
||||
text=markdown,
|
||||
source_path=target_path,
|
||||
span=SourceSpan(1, len(markdown.splitlines())),
|
||||
name=str(document.frontmatter.get("title") or target_path.stem),
|
||||
metadata={"frontmatter": document.frontmatter},
|
||||
)
|
||||
|
||||
|
||||
def _unit_from_query_match(match: QueryMatch, target_path: Path) -> ContentUnit:
|
||||
unit_id = _slug(match.path.replace("$.", "").replace("[", "-").replace("]", ""))
|
||||
name = match.text.splitlines()[0].lstrip("# ").strip() if match.text else match.kind
|
||||
return _content_unit(
|
||||
kind=match.kind,
|
||||
unit_id=unit_id,
|
||||
text=match.text if match.text is not None else str(match.value),
|
||||
source_path=target_path,
|
||||
span=SourceSpan(match.line, None),
|
||||
name=name,
|
||||
metadata={"query_path": match.path, "value": match.value},
|
||||
)
|
||||
|
||||
|
||||
def _section_units(document: Document, target_path: Path) -> list[ContentUnit]:
|
||||
used_ids: dict[str, int] = {}
|
||||
return [
|
||||
_section_unit(section, target_path, used_ids)
|
||||
for section in document.sections
|
||||
]
|
||||
|
||||
|
||||
def _section_unit(
|
||||
section: Section,
|
||||
target_path: Path,
|
||||
used_ids: dict[str, int],
|
||||
) -> ContentUnit:
|
||||
title, explicit_id = _heading_title_and_id(section.heading)
|
||||
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
|
||||
line_end = section.blocks[-1].line_end if section.blocks else section.heading.line
|
||||
lines = [f"{'#' * section.heading.level} {section.heading.text}"]
|
||||
for block in section.blocks:
|
||||
if block.text:
|
||||
lines.extend(["", block.text])
|
||||
return _content_unit(
|
||||
kind="section",
|
||||
unit_id=unit_id,
|
||||
text="\n".join(lines).strip(),
|
||||
source_path=target_path,
|
||||
span=SourceSpan(section.heading.line, line_end),
|
||||
name=title,
|
||||
metadata={"heading_level": section.heading.level},
|
||||
)
|
||||
|
||||
|
||||
def _heading_units(document: Document, target_path: Path) -> list[ContentUnit]:
|
||||
used_ids: dict[str, int] = {}
|
||||
units: list[ContentUnit] = []
|
||||
for heading in document.headings:
|
||||
title, explicit_id = _heading_title_and_id(heading)
|
||||
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
|
||||
units.append(
|
||||
_content_unit(
|
||||
kind="heading",
|
||||
unit_id=unit_id,
|
||||
text=f"{'#' * heading.level} {heading.text}",
|
||||
source_path=target_path,
|
||||
span=SourceSpan(heading.line, heading.line),
|
||||
name=title,
|
||||
metadata={"heading_level": heading.level},
|
||||
)
|
||||
)
|
||||
return units
|
||||
|
||||
|
||||
def _block_fragment_units(
|
||||
document: Document,
|
||||
target_path: Path,
|
||||
value: str,
|
||||
) -> list[ContentUnit]:
|
||||
blocks = _block_units(document.blocks, target_path)
|
||||
if value.isdigit():
|
||||
index = int(value)
|
||||
return [blocks[index]] if 0 <= index < len(blocks) else []
|
||||
lookup = _slug(value)
|
||||
return [unit for unit in blocks if unit.unit_id == lookup]
|
||||
|
||||
|
||||
def _block_units(blocks: list[ContentBlock], target_path: Path) -> list[ContentUnit]:
|
||||
used_ids: dict[str, int] = {}
|
||||
units: list[ContentUnit] = []
|
||||
for index, block in enumerate(blocks):
|
||||
base_id = f"{block.type}-{block.line_start or index}"
|
||||
units.append(
|
||||
_content_unit(
|
||||
kind=block.type,
|
||||
unit_id=_dedupe_id(_slug(base_id), used_ids),
|
||||
text=block.text,
|
||||
source_path=target_path,
|
||||
span=SourceSpan(block.line_start, block.line_end),
|
||||
name=block.type,
|
||||
metadata={"block_index": index},
|
||||
)
|
||||
)
|
||||
return units
|
||||
|
||||
|
||||
def _region_units(markdown: str, target_path: Path) -> list[ContentUnit]:
|
||||
lines = markdown.splitlines()
|
||||
units: list[ContentUnit] = []
|
||||
open_region: tuple[int, str, list[str]] | None = None
|
||||
|
||||
for index, line in enumerate(lines, start=1):
|
||||
open_match = _REGION_OPEN_RE.search(line)
|
||||
close_match = _REGION_CLOSE_RE.search(line)
|
||||
if open_match and open_region is not None:
|
||||
raise ReferenceResolutionError("Nested mkt:region blocks are not supported")
|
||||
if close_match:
|
||||
if open_region is None:
|
||||
raise ReferenceResolutionError("Region close marker has no matching open marker")
|
||||
start_line, region_id, tags = open_region
|
||||
content_lines = lines[start_line:index - 1]
|
||||
units.append(
|
||||
_content_unit(
|
||||
kind="region",
|
||||
unit_id=_slug(region_id),
|
||||
text="\n".join(content_lines).strip(),
|
||||
source_path=target_path,
|
||||
span=SourceSpan(start_line, index),
|
||||
name=region_id,
|
||||
metadata={"tags": tags},
|
||||
)
|
||||
)
|
||||
open_region = None
|
||||
continue
|
||||
if open_match:
|
||||
attrs = _parse_attrs(open_match.group("attrs"))
|
||||
region_id = attrs.get("id")
|
||||
if not region_id:
|
||||
raise ReferenceResolutionError("Region marker requires an id attribute")
|
||||
open_region = (index, region_id, _tags_from_attrs(attrs))
|
||||
|
||||
if open_region is not None:
|
||||
raise ReferenceResolutionError("Region open marker has no matching close marker")
|
||||
return units
|
||||
|
||||
|
||||
def _fenced_block_units(markdown: str, target_path: Path) -> list[ContentUnit]:
|
||||
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||
units: list[ContentUnit] = []
|
||||
used_ids: dict[str, int] = {}
|
||||
for index, token in enumerate(parser.parse(markdown)):
|
||||
if token.type != "fence":
|
||||
continue
|
||||
attrs = _parse_fence_info(token.info)
|
||||
unit_id = attrs.get("id")
|
||||
if not unit_id:
|
||||
continue
|
||||
line_start = token.map[0] + 1 if token.map else None
|
||||
line_end = token.map[1] if token.map else None
|
||||
units.append(
|
||||
_content_unit(
|
||||
kind="fenced_block",
|
||||
unit_id=_dedupe_id(_slug(unit_id), used_ids),
|
||||
text=token.content,
|
||||
source_path=target_path,
|
||||
span=SourceSpan(line_start, line_end),
|
||||
name=unit_id,
|
||||
metadata={
|
||||
"language": attrs.get("language"),
|
||||
"tags": _tags_from_attrs(attrs),
|
||||
"attrs": {
|
||||
key: value
|
||||
for key, value in attrs.items()
|
||||
if key not in {"id", "language", "tag", "tags"}
|
||||
},
|
||||
"block_index": index,
|
||||
},
|
||||
)
|
||||
)
|
||||
return units
|
||||
|
||||
|
||||
def _line_range_units(markdown: str, target_path: Path, value: str) -> list[ContentUnit]:
|
||||
match = re.match(r"^(?P<start>\d+)(?:-(?P<end>\d+))?$", value)
|
||||
if not match:
|
||||
raise ReferenceResolutionError("Line fragments must use `line:start` or `line:start-end`")
|
||||
start = int(match.group("start"))
|
||||
end = int(match.group("end") or start)
|
||||
lines = markdown.splitlines()
|
||||
if start < 1 or end < start or end > len(lines):
|
||||
return []
|
||||
text = "\n".join(lines[start - 1:end])
|
||||
return [
|
||||
_content_unit(
|
||||
kind="line_range",
|
||||
unit_id=f"line-{start}-{end}",
|
||||
text=text,
|
||||
source_path=target_path,
|
||||
span=SourceSpan(start, end),
|
||||
name=f"lines {start}-{end}",
|
||||
metadata={},
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||
match = _FENCE_ATTRS_RE.match(info.strip())
|
||||
if not match:
|
||||
return {"language": info.strip()} if info.strip() else {}
|
||||
attrs = _parse_attrs(match.group("attrs") or "")
|
||||
language = match.group("language")
|
||||
if language:
|
||||
attrs["language"] = language
|
||||
if "id" not in attrs and attrs:
|
||||
for key in list(attrs):
|
||||
if key.startswith("#"):
|
||||
attrs["id"] = key[1:]
|
||||
del attrs[key]
|
||||
break
|
||||
return attrs
|
||||
|
||||
|
||||
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||
attrs: dict[str, str] = {}
|
||||
for part in shlex.split(raw):
|
||||
if part.startswith("#") and len(part) > 1:
|
||||
attrs["id"] = part[1:]
|
||||
continue
|
||||
if "=" not in part:
|
||||
attrs[part] = "true"
|
||||
continue
|
||||
key, value = part.split("=", 1)
|
||||
attrs[key.strip()] = value.strip()
|
||||
return attrs
|
||||
|
||||
|
||||
def _tags_from_attrs(attrs: dict[str, str]) -> list[str]:
|
||||
raw = attrs.get("tags") or attrs.get("tag") or ""
|
||||
return [tag.strip() for tag in re.split(r"[, ]+", raw) if tag.strip()]
|
||||
|
||||
|
||||
def _content_unit(
|
||||
*,
|
||||
kind: str,
|
||||
unit_id: str,
|
||||
text: str,
|
||||
source_path: Path,
|
||||
span: SourceSpan | None,
|
||||
name: str | None,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
) -> ContentUnit:
|
||||
return ContentUnit(
|
||||
kind=kind,
|
||||
unit_id=unit_id,
|
||||
text=text,
|
||||
source_path=str(source_path),
|
||||
span=span,
|
||||
name=name,
|
||||
content_hash="sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest(),
|
||||
metadata=metadata or {},
|
||||
)
|
||||
|
||||
|
||||
def _heading_title_and_id(heading: Heading) -> tuple[str, str | None]:
|
||||
match = _HEADING_ID_RE.match(heading.text.strip())
|
||||
if not match:
|
||||
return heading.text.strip(), None
|
||||
return match.group("title").strip(), match.group("id")
|
||||
|
||||
|
||||
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||
count = used_ids.get(unit_id, 0) + 1
|
||||
used_ids[unit_id] = count
|
||||
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||
|
||||
|
||||
def _slug(value: str) -> str:
|
||||
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||
return slug or "unit"
|
||||
106
tests/test_content_class_resolution.py
Normal file
106
tests/test_content_class_resolution.py
Normal file
@@ -0,0 +1,106 @@
|
||||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.content_class import load_content_classes
|
||||
|
||||
|
||||
def test_c3_linearization_for_diamond_inheritance():
|
||||
registry = load_content_classes(
|
||||
{
|
||||
"classes": {
|
||||
"base": {"slots": {"sections": ["Overview"]}},
|
||||
"left": {"extends": ["base"], "slots": {"sections": ["Left"]}},
|
||||
"right": {"extends": ["base"], "slots": {"sections": ["Right"]}},
|
||||
"leaf": {"extends": ["left", "right"], "slots": {"title": "Leaf"}},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
assert registry.linearize("leaf") == ["leaf", "left", "right", "base"]
|
||||
|
||||
|
||||
def test_compose_merges_slots_with_explicit_policies():
|
||||
registry = load_content_classes(
|
||||
{
|
||||
"classes": {
|
||||
"base": {
|
||||
"slots": {
|
||||
"sections": ["Overview"],
|
||||
"assertions": {"tone": "plain", "depth": "short"},
|
||||
}
|
||||
},
|
||||
"market": {
|
||||
"extends": ["base"],
|
||||
"slots": {
|
||||
"sections": ["Pricing"],
|
||||
"assertions": {"depth": "detailed"},
|
||||
},
|
||||
"merge_policies": {
|
||||
"sections": "append",
|
||||
"assertions": "deep_merge",
|
||||
},
|
||||
},
|
||||
"instance": {
|
||||
"extends": ["market"],
|
||||
"slots": {"sections": ["Risks"]},
|
||||
"merge_policies": {"sections": "append"},
|
||||
},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
result = registry.compose("instance")
|
||||
|
||||
assert result.valid
|
||||
assert result.slots["sections"] == ["Overview", "Pricing", "Risks"]
|
||||
assert result.slots["assertions"] == {"tone": "plain", "depth": "detailed"}
|
||||
|
||||
|
||||
def test_compose_reports_error_on_conflict():
|
||||
registry = load_content_classes(
|
||||
{
|
||||
"classes": {
|
||||
"base": {"slots": {"owner": "A"}},
|
||||
"instance": {
|
||||
"extends": ["base"],
|
||||
"slots": {"owner": "B"},
|
||||
"merge_policies": {"owner": "error_on_conflict"},
|
||||
},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
result = registry.compose("instance")
|
||||
|
||||
assert not result.valid
|
||||
assert result.diagnostics[0].code == "content_class.merge_conflict"
|
||||
|
||||
|
||||
def test_mkt_class_resolve_outputs_text(tmp_path: Path):
|
||||
class_file = tmp_path / "classes.yaml"
|
||||
class_file.write_text(
|
||||
"""classes:
|
||||
base:
|
||||
slots:
|
||||
sections:
|
||||
- Overview
|
||||
instance:
|
||||
extends:
|
||||
- base
|
||||
slots:
|
||||
sections:
|
||||
- Risks
|
||||
merge_policies:
|
||||
sections: append
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = CliRunner().invoke(main, ["class", "resolve", str(class_file), "instance"])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "linearization: instance -> base" in result.output
|
||||
assert "Overview" in result.output
|
||||
assert "Risks" in result.output
|
||||
93
tests/test_explode_implode.py
Normal file
93
tests/test_explode_implode.py
Normal file
@@ -0,0 +1,93 @@
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.explode import (
|
||||
EXPLODE_MANIFEST_NAME,
|
||||
ExplodeError,
|
||||
explode_markdown_file,
|
||||
implode_markdown_directory,
|
||||
)
|
||||
|
||||
|
||||
ROUNDTRIP_DOC = """---
|
||||
title: Explode Example
|
||||
---
|
||||
|
||||
Opening text before the first heading.
|
||||
|
||||
# Intro
|
||||
|
||||
Intro body.
|
||||
|
||||
## Detail
|
||||
|
||||
Detail body.
|
||||
|
||||
# Later
|
||||
|
||||
Later body.
|
||||
"""
|
||||
|
||||
|
||||
def test_flat_explode_implode_roundtrips_exact_markdown(tmp_path: Path):
|
||||
source = tmp_path / "source.md"
|
||||
output_dir = tmp_path / "exploded"
|
||||
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||
|
||||
result = explode_markdown_file(source, output_dir, variant="flat")
|
||||
imploded = implode_markdown_directory(output_dir)
|
||||
|
||||
assert Path(result.manifest_path).name == EXPLODE_MANIFEST_NAME
|
||||
assert (output_dir / "00-preamble.md").exists()
|
||||
assert (output_dir / "sections" / "01-intro.md").exists()
|
||||
assert imploded.markdown == ROUNDTRIP_DOC
|
||||
assert imploded.current_hash == result.manifest.source_hash
|
||||
|
||||
|
||||
def test_hierarchical_explode_places_child_sections_under_parent(tmp_path: Path):
|
||||
source = tmp_path / "source.md"
|
||||
output_dir = tmp_path / "exploded"
|
||||
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||
|
||||
result = explode_markdown_file(source, output_dir, variant="hierarchical")
|
||||
|
||||
files = {Path(path).relative_to(output_dir).as_posix() for path in result.written_files}
|
||||
assert "01-intro.md" in files
|
||||
assert "01-intro/02-detail.md" in files
|
||||
assert implode_markdown_directory(output_dir).markdown == ROUNDTRIP_DOC
|
||||
|
||||
|
||||
def test_explode_rejects_non_empty_output_without_force(tmp_path: Path):
|
||||
source = tmp_path / "source.md"
|
||||
output_dir = tmp_path / "exploded"
|
||||
output_dir.mkdir()
|
||||
(output_dir / "existing.md").write_text("Existing", encoding="utf-8")
|
||||
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||
|
||||
with pytest.raises(ExplodeError, match="not empty"):
|
||||
explode_markdown_file(source, output_dir)
|
||||
|
||||
|
||||
def test_mkt_explode_and_implode(tmp_path: Path):
|
||||
source = tmp_path / "source.md"
|
||||
output_dir = tmp_path / "exploded"
|
||||
rebuilt = tmp_path / "rebuilt.md"
|
||||
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||
runner = CliRunner()
|
||||
|
||||
explode_result = runner.invoke(
|
||||
main,
|
||||
["explode", str(source), "--output-dir", str(output_dir), "--variant", "flat"],
|
||||
)
|
||||
implode_result = runner.invoke(
|
||||
main,
|
||||
["implode", str(output_dir), "--output", str(rebuilt)],
|
||||
)
|
||||
|
||||
assert explode_result.exit_code == 0
|
||||
assert "entries: 4" in explode_result.output
|
||||
assert implode_result.exit_code == 0
|
||||
assert rebuilt.read_text(encoding="utf-8") == ROUNDTRIP_DOC
|
||||
91
tests/test_literate_weave_tangle.py
Normal file
91
tests/test_literate_weave_tangle.py
Normal file
@@ -0,0 +1,91 @@
|
||||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.literate import (
|
||||
discover_code_chunks,
|
||||
tangle_markdown,
|
||||
weave_markdown,
|
||||
write_tangle_files,
|
||||
)
|
||||
|
||||
|
||||
LITERATE_DOC = """# Literate Example
|
||||
|
||||
```python {#helpers}
|
||||
def helper():
|
||||
return "ready"
|
||||
```
|
||||
|
||||
```python {#main tangle="src/app.py"}
|
||||
<<helpers>>
|
||||
|
||||
def main():
|
||||
return helper()
|
||||
```
|
||||
"""
|
||||
|
||||
|
||||
def test_discover_code_chunks_with_references_and_targets():
|
||||
chunks = discover_code_chunks(LITERATE_DOC, source_path="example.md")
|
||||
|
||||
assert [chunk.chunk_id for chunk in chunks] == ["helpers", "main"]
|
||||
assert chunks[1].target_path == "src/app.py"
|
||||
assert chunks[1].references == ["helpers"]
|
||||
|
||||
|
||||
def test_tangle_expands_named_chunk_references():
|
||||
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
|
||||
|
||||
assert result.valid
|
||||
assert len(result.files) == 1
|
||||
assert result.files[0].path == "src/app.py"
|
||||
assert "def helper" in result.files[0].content
|
||||
assert "<<helpers>>" not in result.files[0].content
|
||||
assert result.provenance[0].operation == "literate.tangle"
|
||||
|
||||
|
||||
def test_tangle_reports_missing_chunk_reference():
|
||||
markdown = """```python {#main tangle="src/app.py"}
|
||||
<<missing>>
|
||||
```
|
||||
"""
|
||||
|
||||
result = tangle_markdown(markdown, source_path="example.md")
|
||||
|
||||
assert not result.valid
|
||||
assert result.diagnostics[0].code == "literate.missing_chunk"
|
||||
|
||||
|
||||
def test_weave_appends_chunk_index():
|
||||
result = weave_markdown(LITERATE_DOC, source_path="example.md")
|
||||
|
||||
assert "## Code Chunk Index" in result.markdown
|
||||
assert "`main` -> `src/app.py`; refs: `helpers`" in result.markdown
|
||||
|
||||
|
||||
def test_write_tangle_files(tmp_path: Path):
|
||||
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
|
||||
|
||||
written = write_tangle_files(result, tmp_path)
|
||||
|
||||
assert written == [str(tmp_path / "src" / "app.py")]
|
||||
assert "def main" in (tmp_path / "src" / "app.py").read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_mkt_tangle_and_weave(tmp_path: Path):
|
||||
source = tmp_path / "literate.md"
|
||||
output_dir = tmp_path / "out"
|
||||
woven = tmp_path / "woven.md"
|
||||
source.write_text(LITERATE_DOC, encoding="utf-8")
|
||||
runner = CliRunner()
|
||||
|
||||
tangle_result = runner.invoke(main, ["tangle", str(source), "--output-dir", str(output_dir)])
|
||||
weave_result = runner.invoke(main, ["weave", str(source), "--output", str(woven)])
|
||||
|
||||
assert tangle_result.exit_code == 0
|
||||
assert "files: 1" in tangle_result.output
|
||||
assert (output_dir / "src" / "app.py").exists()
|
||||
assert weave_result.exit_code == 0
|
||||
assert "## Code Chunk Index" in woven.read_text(encoding="utf-8")
|
||||
@@ -34,6 +34,27 @@ title: Original
|
||||
assert "## Intro" in result.markdown
|
||||
assert "### Detail" in result.markdown
|
||||
assert result.operations == ["set_frontmatter", "shift_headings:1"]
|
||||
assert [event.operation for event in result.provenance] == [
|
||||
"set_frontmatter",
|
||||
"shift_headings",
|
||||
]
|
||||
|
||||
|
||||
def test_transform_shifts_headings_without_touching_fenced_code():
|
||||
markdown = """# Intro
|
||||
|
||||
```markdown
|
||||
# Literal Heading
|
||||
```
|
||||
|
||||
## Real Heading
|
||||
"""
|
||||
|
||||
result = transform_markdown(markdown, heading_delta=1)
|
||||
|
||||
assert "```markdown\n# Literal Heading\n```" in result.markdown
|
||||
assert "### Real Heading" in result.markdown
|
||||
assert result.provenance[0].metadata["affected_lines"] == [1, 7]
|
||||
|
||||
|
||||
def test_transform_extracts_selector_text():
|
||||
@@ -104,6 +125,25 @@ def test_resolve_includes_supports_brace_shorthand(tmp_path: Path):
|
||||
assert "Before" in result.markdown
|
||||
assert "Included body." in result.markdown
|
||||
assert "After" in result.markdown
|
||||
assert result.provenance[0].operation == "include"
|
||||
assert result.provenance[0].target_path == str(partial.resolve())
|
||||
|
||||
|
||||
def test_resolve_includes_ignores_markers_inside_fenced_code(tmp_path: Path):
|
||||
partial = tmp_path / "partial.md"
|
||||
partial.write_text("Included body.", encoding="utf-8")
|
||||
markdown = """```markdown
|
||||
{{include:partial.md}}
|
||||
```
|
||||
|
||||
{{include:partial.md}}
|
||||
"""
|
||||
|
||||
result = resolve_includes(markdown, base_dir=tmp_path)
|
||||
|
||||
assert result.markdown.count("Included body.") == 1
|
||||
assert "{{include:partial.md}}" in result.markdown
|
||||
assert result.included_paths == [str(partial.resolve())]
|
||||
|
||||
|
||||
def test_resolve_includes_rejects_cycles(tmp_path: Path):
|
||||
|
||||
105
tests/test_processor_registry.py
Normal file
105
tests/test_processor_registry.py
Normal file
@@ -0,0 +1,105 @@
|
||||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.core import parse_markdown
|
||||
from markitect_tool.processor import (
|
||||
ProcessorContext,
|
||||
default_processor_registry,
|
||||
discover_fenced_processors,
|
||||
run_fenced_processors,
|
||||
)
|
||||
from markitect_tool.reference import load_namespaces
|
||||
|
||||
|
||||
def test_discover_fenced_processors_from_language_prefix():
|
||||
markdown = """# Doc
|
||||
|
||||
```mkt-uppercase {#shout}
|
||||
hello
|
||||
```
|
||||
"""
|
||||
|
||||
blocks = discover_fenced_processors(markdown, source_path="doc.md")
|
||||
|
||||
assert len(blocks) == 1
|
||||
assert blocks[0].processor == "uppercase"
|
||||
assert blocks[0].unit_id == "shout"
|
||||
assert blocks[0].line_start == 3
|
||||
|
||||
|
||||
def test_default_registry_runs_uppercase_processor():
|
||||
markdown = """```mkt-uppercase {#shout}
|
||||
hello
|
||||
```
|
||||
"""
|
||||
context = ProcessorContext()
|
||||
|
||||
run = run_fenced_processors(markdown, context=context)
|
||||
|
||||
assert run.valid
|
||||
assert run.results[0].content == "HELLO\n"
|
||||
assert run.results[0].provenance[0].operation == "processor.uppercase"
|
||||
|
||||
|
||||
def test_include_processor_uses_reference_resolver(tmp_path: Path):
|
||||
source = tmp_path / "doc.md"
|
||||
partial = tmp_path / "partial.md"
|
||||
source.write_text(
|
||||
"""---
|
||||
namespaces:
|
||||
local: .
|
||||
---
|
||||
|
||||
```mkt-include {#intro ref="local:partial.md#summary"}
|
||||
```
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
partial.write_text("# Partial\n\n## Summary\n\nIncluded summary.\n", encoding="utf-8")
|
||||
document = parse_markdown(source.read_text(encoding="utf-8"), source_path=str(source))
|
||||
context = ProcessorContext(
|
||||
root=tmp_path,
|
||||
current_path=source,
|
||||
namespaces=load_namespaces(document.frontmatter),
|
||||
)
|
||||
|
||||
run = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
|
||||
|
||||
assert run.valid
|
||||
assert run.results[0].dependencies == [str(partial.resolve())]
|
||||
assert "Included summary" in run.results[0].content
|
||||
|
||||
|
||||
def test_unknown_processor_returns_diagnostic():
|
||||
markdown = """```mkt-nope {#x}
|
||||
content
|
||||
```
|
||||
"""
|
||||
registry = default_processor_registry()
|
||||
|
||||
run = run_fenced_processors(markdown, context=ProcessorContext(), registry=registry)
|
||||
|
||||
assert not run.valid
|
||||
assert run.results[0].diagnostics[0].code == "processor.unknown"
|
||||
|
||||
|
||||
def test_mkt_process_outputs_text(tmp_path: Path):
|
||||
source = tmp_path / "doc.md"
|
||||
source.write_text(
|
||||
"""# Doc
|
||||
|
||||
```mkt-uppercase {#shout}
|
||||
hello
|
||||
```
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = CliRunner().invoke(main, ["process", str(source), "--root", str(tmp_path)])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "valid" in result.output
|
||||
assert "uppercase shout" in result.output
|
||||
assert "HELLO" in result.output
|
||||
195
tests/test_reference_resolution.py
Normal file
195
tests/test_reference_resolution.py
Normal file
@@ -0,0 +1,195 @@
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.core import parse_markdown
|
||||
from markitect_tool.reference import (
|
||||
ReferenceContext,
|
||||
ReferenceResolutionError,
|
||||
load_namespaces,
|
||||
parse_reference,
|
||||
resolve_reference,
|
||||
)
|
||||
|
||||
|
||||
def test_parse_reference_splits_namespace_fragment_and_selector():
|
||||
address = parse_reference("std:clauses/payment.md#section:fees::blocks[type=code]")
|
||||
|
||||
assert address.namespace == "std"
|
||||
assert address.address == "clauses/payment.md"
|
||||
assert address.fragment == "section:fees"
|
||||
assert address.selector == "blocks[type=code]"
|
||||
|
||||
|
||||
def test_load_namespaces_accepts_optional_colon_suffix():
|
||||
namespaces = load_namespaces({"namespaces": {"std:": "./standard", "src": "../src"}})
|
||||
|
||||
assert namespaces == {"std": "./standard", "src": "../src"}
|
||||
|
||||
|
||||
def test_resolve_path_reference_returns_document_unit(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
target_file = tmp_path / "target.md"
|
||||
context_file.write_text("# Context\n", encoding="utf-8")
|
||||
target_file.write_text("---\nid: target-doc\ntitle: Target\n---\n\n# Target\n\nBody.", encoding="utf-8")
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
resolution = resolve_reference("target.md", context=context)
|
||||
|
||||
assert resolution.target_path == str(target_file.resolve())
|
||||
assert len(resolution.units) == 1
|
||||
assert resolution.units[0].kind == "document"
|
||||
assert resolution.units[0].unit_id == "target-doc"
|
||||
assert "# Target" in resolution.units[0].text
|
||||
|
||||
|
||||
def test_resolve_namespace_reference_and_explicit_section_id(tmp_path: Path):
|
||||
standard = tmp_path / "standard"
|
||||
standard.mkdir()
|
||||
context_file = tmp_path / "context.md"
|
||||
clause_file = standard / "clauses.md"
|
||||
context_file.write_text(
|
||||
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
clause_file.write_text(
|
||||
"# Clauses\n\n## Payment Terms {#payment-terms}\n\nPay within 30 days.\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
document = parse_markdown(context_file.read_text(encoding="utf-8"), source_path=str(context_file))
|
||||
context = ReferenceContext.from_document(document, root=tmp_path)
|
||||
|
||||
resolution = resolve_reference("std:clauses.md#section:payment-terms", context=context)
|
||||
|
||||
assert resolution.units[0].kind == "section"
|
||||
assert resolution.units[0].unit_id == "payment-terms"
|
||||
assert resolution.units[0].name == "Payment Terms"
|
||||
assert "Pay within 30 days" in resolution.units[0].text
|
||||
|
||||
|
||||
def test_resolve_selector_reference_uses_existing_query_engine(tmp_path: Path):
|
||||
standard = tmp_path / "standard"
|
||||
standard.mkdir()
|
||||
context_file = tmp_path / "context.md"
|
||||
source_file = standard / "clauses.md"
|
||||
context_file.write_text(
|
||||
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
source_file.write_text(
|
||||
"# Clauses\n\n## Warranty\n\nWarranty text.\n\n## Liability\n\nLiability text.\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
context = ReferenceContext.from_document(parse_markdown(context_file.read_text(encoding="utf-8"), str(context_file)), root=tmp_path)
|
||||
|
||||
resolution = resolve_reference("std:clauses.md::sections[heading=Warranty]", context=context)
|
||||
|
||||
assert [unit.kind for unit in resolution.units] == ["section"]
|
||||
assert resolution.units[0].name == "Warranty"
|
||||
assert "Liability" not in resolution.units[0].text
|
||||
|
||||
|
||||
def test_resolve_pathless_fragment_uses_current_document(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text("# Context\n\n## Overview\n\nUseful local context.\n", encoding="utf-8")
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
resolution = resolve_reference("#overview", context=context)
|
||||
|
||||
assert resolution.target_path == str(context_file.resolve())
|
||||
assert resolution.units[0].kind == "section"
|
||||
assert resolution.units[0].unit_id == "overview"
|
||||
assert "Useful local context" in resolution.units[0].text
|
||||
|
||||
|
||||
def test_resolve_named_region_by_id_and_tag(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text(
|
||||
"""# Context
|
||||
|
||||
<!-- mkt:region id="overview" tags="reuse summary" -->
|
||||
Reusable region text.
|
||||
<!-- /mkt:region -->
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
by_id = resolve_reference("#region:overview", context=context)
|
||||
by_tag = resolve_reference("#tag:summary", context=context)
|
||||
|
||||
assert by_id.units[0].kind == "region"
|
||||
assert by_id.units[0].text == "Reusable region text."
|
||||
assert by_tag.units[0].unit_id == "overview"
|
||||
|
||||
|
||||
def test_resolve_fenced_block_by_id(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text(
|
||||
"""# Context
|
||||
|
||||
```python {#load-config tags="code setup" tangle="src/config.py"}
|
||||
def load_config():
|
||||
return {}
|
||||
```
|
||||
""",
|
||||
encoding="utf-8",
|
||||
)
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
resolution = resolve_reference("#fence:load-config", context=context)
|
||||
|
||||
assert resolution.units[0].kind == "fenced_block"
|
||||
assert resolution.units[0].unit_id == "load-config"
|
||||
assert resolution.units[0].metadata["language"] == "python"
|
||||
assert resolution.units[0].metadata["attrs"]["tangle"] == "src/config.py"
|
||||
assert "def load_config" in resolution.units[0].text
|
||||
|
||||
|
||||
def test_resolve_line_range_fragment(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text("# Context\n\nLine A\nLine B\nLine C\n", encoding="utf-8")
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
resolution = resolve_reference("#line:3-4", context=context)
|
||||
|
||||
assert resolution.units[0].kind == "line_range"
|
||||
assert resolution.units[0].span.line_start == 3
|
||||
assert resolution.units[0].text == "Line A\nLine B"
|
||||
|
||||
|
||||
def test_resolve_rejects_unknown_namespace(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text("# Context\n", encoding="utf-8")
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
with pytest.raises(ReferenceResolutionError, match="Unknown namespace"):
|
||||
resolve_reference("missing:doc.md", context=context)
|
||||
|
||||
|
||||
def test_resolve_rejects_paths_outside_root(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
context_file.write_text("# Context\n", encoding="utf-8")
|
||||
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||
|
||||
with pytest.raises(ReferenceResolutionError, match="escapes root"):
|
||||
resolve_reference("../outside.md", context=context)
|
||||
|
||||
|
||||
def test_mkt_ref_resolve_outputs_text(tmp_path: Path):
|
||||
context_file = tmp_path / "context.md"
|
||||
target_file = tmp_path / "target.md"
|
||||
context_file.write_text("# Context\n", encoding="utf-8")
|
||||
target_file.write_text("# Target\n\n## Decision\n\nChosen.", encoding="utf-8")
|
||||
|
||||
result = CliRunner().invoke(
|
||||
main,
|
||||
["ref", "resolve", str(context_file), "target.md#decision", "--root", str(tmp_path)],
|
||||
)
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "1 unit(s)" in result.output
|
||||
assert "section decision" in result.output
|
||||
assert "Decision" in result.output
|
||||
60
tests/test_wp0010_migration_examples.py
Normal file
60
tests/test_wp0010_migration_examples.py
Normal file
@@ -0,0 +1,60 @@
|
||||
from pathlib import Path
|
||||
|
||||
from markitect_tool.core import parse_markdown_file
|
||||
from markitect_tool.explode import explode_markdown_file, implode_markdown_directory
|
||||
from markitect_tool.ops import resolve_includes
|
||||
from markitect_tool.processor import ProcessorContext, run_fenced_processors
|
||||
from markitect_tool.reference import load_namespaces
|
||||
from markitect_tool.literate import tangle_markdown
|
||||
|
||||
|
||||
EXAMPLES = Path("examples/migration")
|
||||
|
||||
|
||||
def test_migration_explode_example_roundtrips(tmp_path: Path):
|
||||
source = EXAMPLES / "legacy-explode-source.md"
|
||||
original = source.read_text(encoding="utf-8")
|
||||
|
||||
explode_markdown_file(source, tmp_path / "exploded", variant="hierarchical")
|
||||
result = implode_markdown_directory(tmp_path / "exploded")
|
||||
|
||||
assert result.markdown == original
|
||||
|
||||
|
||||
def test_migration_reference_backed_transclusion_example():
|
||||
source = EXAMPLES / "legacy-transclusion-context.md"
|
||||
document = parse_markdown_file(source)
|
||||
context = ProcessorContext(
|
||||
root=EXAMPLES,
|
||||
current_path=source,
|
||||
namespaces=load_namespaces(document.frontmatter),
|
||||
)
|
||||
|
||||
result = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
|
||||
|
||||
assert result.valid
|
||||
assert "Payment is due within 30 days" in result.results[0].content
|
||||
|
||||
|
||||
def test_migration_path_include_example():
|
||||
source = EXAMPLES / "legacy-path-include.md"
|
||||
|
||||
result = resolve_includes(
|
||||
source.read_text(encoding="utf-8"),
|
||||
base_dir=EXAMPLES,
|
||||
current_path=source,
|
||||
)
|
||||
|
||||
assert "## Warranty" in result.markdown
|
||||
assert "Warranty begins on the effective date" in result.markdown
|
||||
|
||||
|
||||
def test_migration_literate_example_tangles():
|
||||
source = EXAMPLES / "legacy-literate.md"
|
||||
|
||||
result = tangle_markdown(source.read_text(encoding="utf-8"), source_path=source)
|
||||
|
||||
assert result.valid
|
||||
assert result.files[0].path == "src/app.py"
|
||||
assert "CONFIG" in result.files[0].content
|
||||
assert "<<config>>" not in result.files[0].content
|
||||
@@ -3,7 +3,7 @@ id: MKTT-WP-0010
|
||||
type: workplan
|
||||
title: "Content References, Processors, and Literate Workflows"
|
||||
domain: markitect
|
||||
status: todo
|
||||
status: done
|
||||
owner: markitect-tool
|
||||
topic_slug: markitect
|
||||
planning_priority: P1
|
||||
@@ -55,7 +55,7 @@ See `docs/content-reference-literate-workflow-research.md`.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T001
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "f70d2b9d-151b-46c6-9613-bd6bdbf164e7"
|
||||
```
|
||||
@@ -66,11 +66,18 @@ resolver inputs/outputs, and error cases.
|
||||
Output: reference model docs, examples, and tests for path, namespace, selector,
|
||||
and ID resolution.
|
||||
|
||||
Initial implementation completed with a `reference` extension package,
|
||||
frontmatter namespace loading, root-bounded path resolution, existing query
|
||||
selector reuse, heading/section/block fragment IDs, CLI access via
|
||||
`mkt ref resolve`, reference docs, examples, and tests. Region/tag/fenced-block
|
||||
addressing continues in P10.3; processor dependency/provenance use continues in
|
||||
P10.2 and P10.5.
|
||||
|
||||
## P10.2 - Add token-safe transforms and operation provenance
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T002
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "e35639b7-756f-4993-8b3c-2e58b23e0eca"
|
||||
```
|
||||
@@ -80,11 +87,17 @@ structured operation provenance, dependency edges, source spans, and diagnostics
|
||||
|
||||
Output: token-safe transform implementation and provenance result envelope.
|
||||
|
||||
Initial implementation completed with token-safe heading shifts, include
|
||||
markers that stay literal inside fenced or indented code blocks, additive
|
||||
`OperationProvenance` events on transform/include results, dependency edges for
|
||||
resolved includes, docs, and regression tests. Rich structured diagnostics and
|
||||
source maps continue through P10.3, P10.4, and P10.5.
|
||||
|
||||
## P10.3 - Implement named regions and addressable block selectors
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T003
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "98cafe28-a364-48f1-ae55-cb47c71d9441"
|
||||
```
|
||||
@@ -94,11 +107,17 @@ selection by ID/tag/line range where appropriate.
|
||||
|
||||
Output: region parser/resolver, CLI examples, and source-snippet tests.
|
||||
|
||||
Initial implementation completed as reference-layer extensions: named
|
||||
`mkt:region` comments, region tags, fenced-block IDs and tags from info-string
|
||||
attributes, `#line:start-end` ranges, convenience ID lookup ordering, docs,
|
||||
examples, and tests. Deeper source maps and processor-owned block semantics
|
||||
continue in P10.5 and P10.6.
|
||||
|
||||
## P10.4 - Reimplement reversible explode/implode variants
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T004
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "67f77aa1-a7ee-485c-891e-6ae7ecc52067"
|
||||
```
|
||||
@@ -111,11 +130,16 @@ reference and processor model is stable.
|
||||
|
||||
Output: `mkt explode`, `mkt implode`, manifest schema, roundtrip tests.
|
||||
|
||||
Initial implementation completed with a separate `explode` extension package,
|
||||
manifest-first flat and hierarchical variants, exact roundtrip implode,
|
||||
non-empty output protection, CLI commands, docs, and tests. Semantic variants
|
||||
remain deferred until processor and content-class semantics are stable.
|
||||
|
||||
## P10.5 - Define processor registry for fenced blocks
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T005
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "eb7cde08-8a73-4163-ac54-19a2bc7b5f88"
|
||||
```
|
||||
@@ -126,11 +150,18 @@ and return generated content/files, diagnostics, dependencies, and provenance.
|
||||
|
||||
Output: processor registry API, deterministic built-in processors, and tests.
|
||||
|
||||
Initial implementation completed with a deterministic `processor` extension
|
||||
package, fenced-block discovery, explicit registry, context/policy envelope,
|
||||
result files/diagnostics/dependencies/provenance, built-in identity,
|
||||
uppercase, and reference-backed include processors, CLI `mkt process`, docs,
|
||||
examples, and tests. Arbitrary code or LLM execution remains intentionally
|
||||
outside this deterministic registry floor.
|
||||
|
||||
## P10.6 - Implement literate weave/tangle MVP
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T006
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "090fcc38-758b-4414-b941-40f217eb17ca"
|
||||
```
|
||||
@@ -141,11 +172,16 @@ cross-references.
|
||||
|
||||
Output: `mkt tangle`, `mkt weave`, chunk-reference diagnostics, examples.
|
||||
|
||||
Initial implementation completed with a `literate` extension package, named
|
||||
fenced code chunks, `tangle` targets, noweb-style `<<chunk-id>>` expansion,
|
||||
missing/cyclic chunk diagnostics, deterministic file writing, woven chunk
|
||||
index output, CLI `mkt tangle`/`mkt weave`, docs, examples, and tests.
|
||||
|
||||
## P10.7 - Design content class composition and multi-inheritance
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T007
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "220e6b27-2d7b-4c22-b5e8-304198ecfea8"
|
||||
```
|
||||
@@ -156,11 +192,16 @@ diagnostics.
|
||||
|
||||
Output: architecture note, examples, and a small deterministic resolver spike.
|
||||
|
||||
Initial implementation completed with a `content_class` extension package,
|
||||
C3-style deterministic linearization, explicit slot merge policies, conflict
|
||||
diagnostics, CLI `mkt class resolve`, docs, examples, and tests. Markdown
|
||||
instantiation and snippet injection remain deferred to later integration work.
|
||||
|
||||
## P10.8 - Add migration examples from markitect-main
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0010-T008
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "287637d3-1997-43b2-b97d-10587d565cec"
|
||||
```
|
||||
@@ -169,3 +210,9 @@ Translate the relevant old explode/implode, transclusion, and spaces reference
|
||||
graph tests into successor-style fixtures and examples.
|
||||
|
||||
Output: migration test inventory, example documents, and parity notes.
|
||||
|
||||
Initial implementation completed with WP-0010 migration parity notes,
|
||||
successor-style examples for explode/implode, path include, reference-backed
|
||||
transclusion, and literate tangling, plus tests that exercise these examples.
|
||||
Legacy platform, database, infospace, rendering, and provider-specific
|
||||
behaviors remain intentionally out of scope.
|
||||
|
||||
Reference in New Issue
Block a user