generated from coulomb/repo-seed
extension for ref resolve, explode, implode, weave, tangle
This commit is contained in:
79
docs/content-classes.md
Normal file
79
docs/content-classes.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
# Content Classes
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Content classes are data-defined composition rules for reusable document
|
||||||
|
structures, overlays, and variants. They are not Python inheritance. They are a
|
||||||
|
deterministic way to combine slots such as sections, assertions, snippets,
|
||||||
|
processors, and style guidance.
|
||||||
|
|
||||||
|
This is the P10.7 resolver spike for future class/object-style workflows.
|
||||||
|
|
||||||
|
## Model
|
||||||
|
|
||||||
|
A class can declare:
|
||||||
|
|
||||||
|
- `extends`: parent classes
|
||||||
|
- `slots`: structured values to contribute
|
||||||
|
- `merge_policies`: per-slot merge behavior
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
classes:
|
||||||
|
base-prd:
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Problem
|
||||||
|
- Decision
|
||||||
|
enterprise:
|
||||||
|
extends:
|
||||||
|
- base-prd
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Compliance
|
||||||
|
merge_policies:
|
||||||
|
sections: append
|
||||||
|
```
|
||||||
|
|
||||||
|
## Linearization
|
||||||
|
|
||||||
|
Multiple inheritance uses a C3-style linearization. That gives us:
|
||||||
|
|
||||||
|
- deterministic parent ordering
|
||||||
|
- monotonic inheritance behavior
|
||||||
|
- explicit diagnostics for cycles, unknown parents, and inconsistent precedence
|
||||||
|
|
||||||
|
The resolved class is merged from base to leaf according to the computed
|
||||||
|
linearization.
|
||||||
|
|
||||||
|
## Merge Policies
|
||||||
|
|
||||||
|
Initial policies:
|
||||||
|
|
||||||
|
- `replace`
|
||||||
|
- `append`
|
||||||
|
- `prepend`
|
||||||
|
- `deep_merge`
|
||||||
|
- `error_on_conflict`
|
||||||
|
|
||||||
|
Unknown policies and invalid value shapes produce diagnostics.
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
Resolve a class:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt class resolve examples/classes/prd-classes.yaml enterprise-prd
|
||||||
|
```
|
||||||
|
|
||||||
|
JSON/YAML output includes the linearization, merged slots, and diagnostics.
|
||||||
|
|
||||||
|
## Extension Boundary
|
||||||
|
|
||||||
|
The current resolver does not yet instantiate Markdown documents or inject
|
||||||
|
snippets. It establishes the deterministic inheritance and merge floor. Later
|
||||||
|
work can connect resolved slots to contracts, references, processors, and
|
||||||
|
generation plans.
|
||||||
139
docs/content-references.md
Normal file
139
docs/content-references.md
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
# Content References
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Content references are the first WP-0010 extension layer. They give Markitect a
|
||||||
|
shared way to name and resolve Markdown content units without changing the
|
||||||
|
existing parser, query, transform, compose, include, contract, or cache APIs.
|
||||||
|
|
||||||
|
The goal is a small resolver that later features can reuse:
|
||||||
|
|
||||||
|
- includes can accept references as well as paths
|
||||||
|
- explode/implode can write manifests with stable unit IDs
|
||||||
|
- processors can receive typed units and dependency edges
|
||||||
|
- tangle/weave can address chunks and generated outputs
|
||||||
|
- cache and access-control backends can index the same IDs
|
||||||
|
|
||||||
|
## Reference Syntax
|
||||||
|
|
||||||
|
References are compact strings:
|
||||||
|
|
||||||
|
```text
|
||||||
|
path/to/file.md
|
||||||
|
path/to/file.md#section:introduction
|
||||||
|
path/to/file.md::sections[heading=Decision]
|
||||||
|
std:clauses/payment.md
|
||||||
|
std:clauses/payment.md#payment-terms
|
||||||
|
std:clauses/payment.md#region:boilerplate
|
||||||
|
std:clauses/payment.md#tag:legal
|
||||||
|
#local-section
|
||||||
|
```
|
||||||
|
|
||||||
|
The parts are:
|
||||||
|
|
||||||
|
- `namespace:`: optional namespace declared in frontmatter
|
||||||
|
- `path`: a Markdown file path relative to the current document, or relative to
|
||||||
|
the namespace target
|
||||||
|
- `#fragment`: optional unit lookup inside the target document
|
||||||
|
- `::selector`: optional existing Markitect query selector
|
||||||
|
|
||||||
|
Fragments and selectors are mutually exclusive during resolution. Selectors are
|
||||||
|
delegated to the existing query engine, which keeps this layer small and avoids
|
||||||
|
inventing a second query language.
|
||||||
|
|
||||||
|
## Namespaces
|
||||||
|
|
||||||
|
Namespaces live in Markdown frontmatter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
namespaces:
|
||||||
|
std: ./standard
|
||||||
|
product: ../product-docs
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
Namespace keys may be written with or without a trailing colon. Namespace values
|
||||||
|
are string paths. Relative namespace paths resolve under the resolver root. All
|
||||||
|
resolved file paths must stay inside that root.
|
||||||
|
|
||||||
|
## Content Units
|
||||||
|
|
||||||
|
The resolver currently emits these unit kinds:
|
||||||
|
|
||||||
|
- `document`: full Markdown file
|
||||||
|
- `section`: heading-led Markdown section
|
||||||
|
- `heading`: heading line
|
||||||
|
- existing query kinds such as `frontmatter`, `block`, `metrics`, or `section`
|
||||||
|
|
||||||
|
Each unit includes:
|
||||||
|
|
||||||
|
- `unit_id`: stable local ID
|
||||||
|
- `kind`
|
||||||
|
- `source_path`
|
||||||
|
- source line span when available
|
||||||
|
- `name`
|
||||||
|
- `content_hash`
|
||||||
|
- raw text
|
||||||
|
- metadata from the source or query match
|
||||||
|
|
||||||
|
Heading and section IDs use an explicit trailing heading ID when present:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Payment Terms {#payment-terms}
|
||||||
|
```
|
||||||
|
|
||||||
|
Otherwise the resolver derives a slug from the heading text and adds numeric
|
||||||
|
suffixes for collisions.
|
||||||
|
|
||||||
|
Named regions use HTML comments so they can live in Markdown and many source
|
||||||
|
files without changing the rendered document:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
<!-- mkt:region id="boilerplate" tags="legal reuse" -->
|
||||||
|
Reusable text.
|
||||||
|
<!-- /mkt:region -->
|
||||||
|
```
|
||||||
|
|
||||||
|
Fenced blocks can be addressed when their info string includes an ID:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```python {#load-config tags="code setup" tangle="src/config.py"}
|
||||||
|
def load_config():
|
||||||
|
return {}
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
Supported fragments now include:
|
||||||
|
|
||||||
|
- `#section:<id-or-heading-slug>`
|
||||||
|
- `#heading:<id-or-heading-slug>`
|
||||||
|
- `#region:<id>`
|
||||||
|
- `#fence:<id>`
|
||||||
|
- `#tag:<tag>`
|
||||||
|
- `#line:<start>` or `#line:<start>-<end>`
|
||||||
|
- `#<id>` as a convenience lookup across sections, regions, fenced blocks, and
|
||||||
|
headings
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
Resolve a reference from a context document:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt ref resolve examples/references/context.md 'std:clauses.md#payment-terms'
|
||||||
|
```
|
||||||
|
|
||||||
|
JSON and YAML formats include the resolved text and metadata:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt ref resolve examples/references/context.md 'std:clauses.md::sections[heading=Warranty]' --format json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Extension Boundary
|
||||||
|
|
||||||
|
This layer is intentionally read-only. It does not replace `mkt include`,
|
||||||
|
`mkt query`, or `mkt extract`. Instead it defines the address model those tools
|
||||||
|
can adopt when their next WP-0010 tasks require richer content identity,
|
||||||
|
processor dependencies, source maps, and reversible manifests.
|
||||||
69
docs/explode-implode.md
Normal file
69
docs/explode-implode.md
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
# Explode and Implode
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
`mkt explode` and `mkt implode` reintroduce the useful old Markitect
|
||||||
|
large-document workflow as a slim WP-0010 extension. The design is
|
||||||
|
manifest-first: the exploded directory is editable, but the manifest preserves
|
||||||
|
ordering, source spans, heading metadata, hashes, frontmatter, and the selected
|
||||||
|
layout variant.
|
||||||
|
|
||||||
|
This keeps the operation reversible without requiring a database or service.
|
||||||
|
|
||||||
|
## Variants
|
||||||
|
|
||||||
|
The initial variants are:
|
||||||
|
|
||||||
|
- `flat`: writes ordered section files under `sections/`.
|
||||||
|
- `hierarchical`: writes child section files below parent heading directories.
|
||||||
|
|
||||||
|
Both variants preserve the same manifest model. A later semantic variant can
|
||||||
|
reuse the reference and processor framework once those layers are stable.
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
Explode a document:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt explode docs/source.md --output-dir work/source-exploded
|
||||||
|
```
|
||||||
|
|
||||||
|
Use a hierarchical directory shape:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt explode docs/source.md --output-dir work/source-tree --variant hierarchical
|
||||||
|
```
|
||||||
|
|
||||||
|
Implode the directory back into one Markdown file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt implode work/source-exploded --output docs/source-rebuilt.md
|
||||||
|
```
|
||||||
|
|
||||||
|
By default `mkt explode` refuses to write into a non-empty output directory. Use
|
||||||
|
`--force` when an explicit overwrite is intended.
|
||||||
|
|
||||||
|
## Manifest
|
||||||
|
|
||||||
|
The manifest is written as `markitect-explode.yaml` in the output directory.
|
||||||
|
It records:
|
||||||
|
|
||||||
|
- manifest version
|
||||||
|
- original source path and SHA-256 hash
|
||||||
|
- variant
|
||||||
|
- raw frontmatter block
|
||||||
|
- ordered entries with file path, kind, unit ID, source line span, heading
|
||||||
|
metadata, and content hash
|
||||||
|
|
||||||
|
Implode reads the manifest entries in order and concatenates the current entry
|
||||||
|
files. If users edit section files, the rebuilt document reflects those edits
|
||||||
|
while preserving the original frontmatter and ordering.
|
||||||
|
|
||||||
|
## Extension Boundary
|
||||||
|
|
||||||
|
This implementation is intentionally not semantic yet. It does not infer
|
||||||
|
contracts, classes, named chunks, or processor outputs. Instead it establishes a
|
||||||
|
small reversible substrate that later WP-0010 tasks can enrich with regions,
|
||||||
|
references, processors, source maps, and weave/tangle behavior.
|
||||||
79
docs/literate-weave-tangle.md
Normal file
79
docs/literate-weave-tangle.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
# Literate Weave and Tangle
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The literate workflow layer brings a small Knuth-style weave/tangle capability
|
||||||
|
to Markdown without requiring a separate language. Prose stays in Markdown.
|
||||||
|
Named code chunks live in fenced blocks. Tangling emits source files.
|
||||||
|
Weaving keeps the document readable and adds a deterministic chunk index.
|
||||||
|
|
||||||
|
## Chunk Syntax
|
||||||
|
|
||||||
|
Named chunks use fenced block attributes:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```python {#helpers}
|
||||||
|
def helper():
|
||||||
|
return "ready"
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
A chunk becomes an output root when it declares `tangle`:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```python {#main tangle="src/app.py"}
|
||||||
|
<<helpers>>
|
||||||
|
|
||||||
|
def main():
|
||||||
|
return helper()
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
Chunk references use noweb-style syntax:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<<helpers>>
|
||||||
|
```
|
||||||
|
|
||||||
|
Whole-line chunk references preserve indentation when expanded.
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
Tangle files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt tangle examples/literate/app.md --output-dir build/literate
|
||||||
|
```
|
||||||
|
|
||||||
|
Inspect without writing:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt tangle examples/literate/app.md --format json
|
||||||
|
```
|
||||||
|
|
||||||
|
Weave documentation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt weave examples/literate/app.md --output build/app-woven.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Diagnostics
|
||||||
|
|
||||||
|
Tangling reports structured diagnostics for missing chunks and cyclic chunk
|
||||||
|
references. Tangled files are only written by the CLI when the result is valid.
|
||||||
|
|
||||||
|
## Extension Boundary
|
||||||
|
|
||||||
|
The MVP deliberately keeps the model narrow:
|
||||||
|
|
||||||
|
- named fenced blocks
|
||||||
|
- `tangle="<path>"`
|
||||||
|
- deterministic document-order concatenation for repeated targets
|
||||||
|
- noweb-style chunk expansion
|
||||||
|
- generated chunk index during weave
|
||||||
|
|
||||||
|
Future extensions can add richer source maps, processor execution,
|
||||||
|
language-specific extraction, and class/namespace-aware chunk selection without
|
||||||
|
changing this initial chunk model.
|
||||||
46
docs/markitect-main-wp0010-migration-notes.md
Normal file
46
docs/markitect-main-wp0010-migration-notes.md
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
# markitect-main WP-0010 Migration Notes
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This note captures the relevant `markitect-main` ideas that WP-0010 now
|
||||||
|
preserves in successor form.
|
||||||
|
|
||||||
|
The migration is conceptual rather than source-compatible. The successor keeps
|
||||||
|
Markdown-native behavior and removes old platform, database, infospace, and
|
||||||
|
service assumptions.
|
||||||
|
|
||||||
|
## Parity Map
|
||||||
|
|
||||||
|
| Legacy area | Successor shape | Status |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| Explode/implode variants | `mkt explode`, `mkt implode`, manifest-first flat/hierarchical variants | Reimplemented |
|
||||||
|
| Transclusion/includes | `mkt include` for path markers; processor `mkt-include` for reference-backed content | Reimplemented with clearer boundaries |
|
||||||
|
| Spaces/infospace references | Frontmatter namespaces plus `mkt ref resolve` | Reframed as syntax-layer references |
|
||||||
|
| Fenced-block processors | Explicit deterministic processor registry | Reimplemented as opt-in extension |
|
||||||
|
| Literate workflows | `mkt tangle`, `mkt weave`, named fenced chunks, noweb references | Reimplemented as MVP |
|
||||||
|
| Content classes/overlays | Data-defined classes with C3-style linearization and merge policies | Resolver spike implemented |
|
||||||
|
|
||||||
|
## Intentionally Not Migrated
|
||||||
|
|
||||||
|
These old concerns stay out of the WP-0010 toolkit layer:
|
||||||
|
|
||||||
|
- database-backed infospace lifecycle
|
||||||
|
- GraphQL/service APIs
|
||||||
|
- provider-specific LLM execution
|
||||||
|
- rendering/plugin/browser/editor infrastructure
|
||||||
|
- project finance, wishlist, and profile tooling
|
||||||
|
|
||||||
|
## Migration Examples
|
||||||
|
|
||||||
|
Examples live under `examples/migration/`:
|
||||||
|
|
||||||
|
- `legacy-explode-source.md`: large document roundtrip via explode/implode.
|
||||||
|
- `legacy-transclusion-context.md`: namespace-backed reference include.
|
||||||
|
- `legacy-path-include.md`: simple path-based include marker.
|
||||||
|
- `legacy-literate.md`: named chunks tangled into source.
|
||||||
|
|
||||||
|
The tests in `tests/test_wp0010_migration_examples.py` exercise these files as
|
||||||
|
successor fixtures. They are deliberately small, but they lock down the
|
||||||
|
behaviors we most wanted to keep from `markitect-main`.
|
||||||
81
docs/processors.md
Normal file
81
docs/processors.md
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
# Fenced-Block Processors
|
||||||
|
|
||||||
|
Date: 2026-05-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The processor registry is the deterministic execution boundary for WP-0010.
|
||||||
|
It lets Markdown fenced blocks opt into named processors while keeping
|
||||||
|
execution explicit, inspectable, and non-magical.
|
||||||
|
|
||||||
|
Processors receive:
|
||||||
|
|
||||||
|
- the fenced content unit
|
||||||
|
- resolver-capable context
|
||||||
|
- variables and policy maps
|
||||||
|
|
||||||
|
Processors return:
|
||||||
|
|
||||||
|
- generated content
|
||||||
|
- optional generated files
|
||||||
|
- diagnostics
|
||||||
|
- dependencies
|
||||||
|
- operation provenance
|
||||||
|
|
||||||
|
No built-in processor runs arbitrary code.
|
||||||
|
|
||||||
|
## Syntax
|
||||||
|
|
||||||
|
A fenced block opts into processing by using an `mkt-<processor>` language:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```mkt-uppercase {#shout}
|
||||||
|
hello
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
The processor can also be named with attributes:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```markdown {#example processor="identity"}
|
||||||
|
Rendered as-is by the identity processor.
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
## Built-In Processors
|
||||||
|
|
||||||
|
Initial deterministic processors:
|
||||||
|
|
||||||
|
- `identity`: returns the fenced block content unchanged.
|
||||||
|
- `uppercase`: returns uppercased content; mainly a registry smoke-test.
|
||||||
|
- `include`: resolves a `ref` attribute through the content reference resolver.
|
||||||
|
|
||||||
|
Reference-backed include:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
```mkt-include {#payment ref="std:clauses.md#payment-terms"}
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
The include processor returns the resolved content, records the target file as
|
||||||
|
a dependency, and emits operation provenance.
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
Run processors in a document:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkt process examples/references/context.md --format json
|
||||||
|
```
|
||||||
|
|
||||||
|
Text output reports processor validity, block IDs, and the first generated
|
||||||
|
content line. JSON/YAML output includes diagnostics, dependencies, and
|
||||||
|
provenance.
|
||||||
|
|
||||||
|
## Extension Boundary
|
||||||
|
|
||||||
|
The registry is deliberately small. It does not render a final document yet and
|
||||||
|
does not execute shell, Python, SQL, or LLM calls. Those can become opt-in
|
||||||
|
processors later, but they should use the same result envelope so diagnostics,
|
||||||
|
dependencies, provenance, cache invalidation, and access-control hooks stay
|
||||||
|
consistent.
|
||||||
@@ -27,6 +27,10 @@ Supported operations:
|
|||||||
|
|
||||||
The API equivalent is `transform_markdown(...)`.
|
The API equivalent is `transform_markdown(...)`.
|
||||||
|
|
||||||
|
Heading shifts are token-safe: Markdown fenced and indented code blocks are
|
||||||
|
left untouched even if their lines look like headings. `TransformResult`
|
||||||
|
includes structured provenance events alongside the older operation-name list.
|
||||||
|
|
||||||
## Compose
|
## Compose
|
||||||
|
|
||||||
Use `mkt compose` to concatenate Markdown inputs with predictable separators:
|
Use `mkt compose` to concatenate Markdown inputs with predictable separators:
|
||||||
@@ -79,5 +83,12 @@ Resolution rules:
|
|||||||
directory.
|
directory.
|
||||||
- Recursive includes are resolved up to `--max-depth`.
|
- Recursive includes are resolved up to `--max-depth`.
|
||||||
- Cycles and missing files fail with explicit errors.
|
- Cycles and missing files fail with explicit errors.
|
||||||
|
- Include markers inside fenced or indented code blocks are left literal.
|
||||||
|
|
||||||
The API equivalent is `resolve_includes(...)`.
|
The API equivalent is `resolve_includes(...)`.
|
||||||
|
|
||||||
|
`IncludeResult` includes structured provenance events. Each include event
|
||||||
|
records the source marker line when available, the resolved target path,
|
||||||
|
dependency edge, selector, heading shift, and frontmatter policy. This is the
|
||||||
|
first provenance envelope used by later WP-0010 processor, source-map, and
|
||||||
|
explode/implode work.
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ and descriptions mirror the operational view.
|
|||||||
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
|
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
|
||||||
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
|
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
|
||||||
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
|
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
|
||||||
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
|
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
|
||||||
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
||||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||||
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
||||||
|
|||||||
30
examples/classes/prd-classes.yaml
Normal file
30
examples/classes/prd-classes.yaml
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
classes:
|
||||||
|
base-prd:
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Problem
|
||||||
|
- Decision
|
||||||
|
assertions:
|
||||||
|
tone: plain
|
||||||
|
audience: product
|
||||||
|
|
||||||
|
enterprise:
|
||||||
|
extends:
|
||||||
|
- base-prd
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Compliance
|
||||||
|
assertions:
|
||||||
|
audience: enterprise buyers
|
||||||
|
merge_policies:
|
||||||
|
sections: append
|
||||||
|
assertions: deep_merge
|
||||||
|
|
||||||
|
enterprise-prd:
|
||||||
|
extends:
|
||||||
|
- enterprise
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Rollout
|
||||||
|
merge_policies:
|
||||||
|
sections: append
|
||||||
15
examples/literate/app.md
Normal file
15
examples/literate/app.md
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
# Literate App Example
|
||||||
|
|
||||||
|
This example explains the helper before showing the application entry point.
|
||||||
|
|
||||||
|
```python {#helpers}
|
||||||
|
def helper():
|
||||||
|
return "ready"
|
||||||
|
```
|
||||||
|
|
||||||
|
```python {#main tangle="src/app.py"}
|
||||||
|
<<helpers>>
|
||||||
|
|
||||||
|
def main():
|
||||||
|
return helper()
|
||||||
|
```
|
||||||
17
examples/migration/legacy-explode-source.md
Normal file
17
examples/migration/legacy-explode-source.md
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
---
|
||||||
|
title: Legacy Explode Successor
|
||||||
|
---
|
||||||
|
|
||||||
|
Opening material that used to be easy to lose in section-only exports.
|
||||||
|
|
||||||
|
# Overview
|
||||||
|
|
||||||
|
The successor explode flow preserves preamble, headings, order, and frontmatter.
|
||||||
|
|
||||||
|
## Detail
|
||||||
|
|
||||||
|
Nested sections remain addressable and roundtrip through the manifest.
|
||||||
|
|
||||||
|
# Follow-Up
|
||||||
|
|
||||||
|
Later sections keep their document order.
|
||||||
12
examples/migration/legacy-literate.md
Normal file
12
examples/migration/legacy-literate.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# Legacy Literate Successor
|
||||||
|
|
||||||
|
```python {#config}
|
||||||
|
CONFIG = {"ready": True}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python {#main tangle="src/app.py"}
|
||||||
|
<<config>>
|
||||||
|
|
||||||
|
def main():
|
||||||
|
return CONFIG["ready"]
|
||||||
|
```
|
||||||
3
examples/migration/legacy-path-include.md
Normal file
3
examples/migration/legacy-path-include.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# Path Include
|
||||||
|
|
||||||
|
<!-- mkt:include path="standard/clauses.md" selector="sections[heading~=Warranty]" -->
|
||||||
13
examples/migration/legacy-transclusion-context.md
Normal file
13
examples/migration/legacy-transclusion-context.md
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
---
|
||||||
|
title: Legacy Transclusion Successor
|
||||||
|
namespaces:
|
||||||
|
std: ./standard
|
||||||
|
---
|
||||||
|
|
||||||
|
# Contract Draft
|
||||||
|
|
||||||
|
The old broad transclusion idea is now split into path includes and
|
||||||
|
reference-backed processors.
|
||||||
|
|
||||||
|
```mkt-include {#payment-clause ref="std:clauses.md#payment"}
|
||||||
|
```
|
||||||
9
examples/migration/standard/clauses.md
Normal file
9
examples/migration/standard/clauses.md
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Standard Clauses
|
||||||
|
|
||||||
|
## Payment {#payment}
|
||||||
|
|
||||||
|
Payment is due within 30 days.
|
||||||
|
|
||||||
|
## Warranty {#warranty}
|
||||||
|
|
||||||
|
Warranty begins on the effective date.
|
||||||
26
examples/references/context.md
Normal file
26
examples/references/context.md
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
---
|
||||||
|
title: Reference Context
|
||||||
|
namespaces:
|
||||||
|
std: ./standard
|
||||||
|
---
|
||||||
|
|
||||||
|
# Reference Context
|
||||||
|
|
||||||
|
This document declares the namespaces used by reference examples.
|
||||||
|
|
||||||
|
## Local Overview
|
||||||
|
|
||||||
|
Local sections can be addressed with `#local-overview`.
|
||||||
|
|
||||||
|
<!-- mkt:region id="summary-snippet" tags="reuse summary" -->
|
||||||
|
This named region can be resolved with `#region:summary-snippet` or
|
||||||
|
`#tag:summary`.
|
||||||
|
<!-- /mkt:region -->
|
||||||
|
|
||||||
|
```python {#example-loader tags="code demo" tangle="src/example_loader.py"}
|
||||||
|
def load_example():
|
||||||
|
return "ready"
|
||||||
|
```
|
||||||
|
|
||||||
|
```mkt-include {#payment-example ref="std:clauses.md#payment-terms"}
|
||||||
|
```
|
||||||
9
examples/references/standard/clauses.md
Normal file
9
examples/references/standard/clauses.md
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Standard Clauses
|
||||||
|
|
||||||
|
## Payment Terms {#payment-terms}
|
||||||
|
|
||||||
|
Payment is due within 30 days unless a governing contract says otherwise.
|
||||||
|
|
||||||
|
## Warranty
|
||||||
|
|
||||||
|
The warranty period starts on the effective date.
|
||||||
@@ -32,7 +32,26 @@ from markitect_tool.cache import (
|
|||||||
save_cache,
|
save_cache,
|
||||||
scan_markdown_files,
|
scan_markdown_files,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.content_class import (
|
||||||
|
ClassCompositionResult,
|
||||||
|
ContentClass,
|
||||||
|
ContentClassRegistry,
|
||||||
|
ContentClassResolutionError,
|
||||||
|
load_content_class_file,
|
||||||
|
load_content_classes,
|
||||||
|
)
|
||||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||||
|
from markitect_tool.explode import (
|
||||||
|
EXPLODE_MANIFEST_NAME,
|
||||||
|
ExplodeEntry,
|
||||||
|
ExplodeError,
|
||||||
|
ExplodeManifest,
|
||||||
|
ExplodeResult,
|
||||||
|
ImplodeResult,
|
||||||
|
explode_markdown_file,
|
||||||
|
implode_markdown_directory,
|
||||||
|
load_explode_manifest,
|
||||||
|
)
|
||||||
from markitect_tool.generation import (
|
from markitect_tool.generation import (
|
||||||
GeneratedDocument,
|
GeneratedDocument,
|
||||||
GenerationHookRequest,
|
GenerationHookRequest,
|
||||||
@@ -44,21 +63,55 @@ from markitect_tool.generation import (
|
|||||||
load_generation_plan_file,
|
load_generation_plan_file,
|
||||||
run_generation_plan,
|
run_generation_plan,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.literate import (
|
||||||
|
CodeChunk,
|
||||||
|
LiterateFile,
|
||||||
|
TangleResult,
|
||||||
|
WeaveResult,
|
||||||
|
discover_code_chunks,
|
||||||
|
tangle_markdown,
|
||||||
|
weave_markdown,
|
||||||
|
write_tangle_files,
|
||||||
|
)
|
||||||
from markitect_tool.ops import (
|
from markitect_tool.ops import (
|
||||||
ComposeResult,
|
ComposeResult,
|
||||||
IncludeError,
|
IncludeError,
|
||||||
IncludeResult,
|
IncludeResult,
|
||||||
|
OperationProvenance,
|
||||||
TransformResult,
|
TransformResult,
|
||||||
compose_files,
|
compose_files,
|
||||||
resolve_includes,
|
resolve_includes,
|
||||||
transform_markdown,
|
transform_markdown,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.processor import (
|
||||||
|
FencedProcessorBlock,
|
||||||
|
ProcessorContext,
|
||||||
|
ProcessorOutputFile,
|
||||||
|
ProcessorRegistry,
|
||||||
|
ProcessorRequest,
|
||||||
|
ProcessorResult,
|
||||||
|
ProcessorRun,
|
||||||
|
default_processor_registry,
|
||||||
|
discover_fenced_processors,
|
||||||
|
run_fenced_processors,
|
||||||
|
)
|
||||||
from markitect_tool.query import (
|
from markitect_tool.query import (
|
||||||
InvalidQueryError,
|
InvalidQueryError,
|
||||||
QueryMatch,
|
QueryMatch,
|
||||||
extract_document,
|
extract_document,
|
||||||
query_document,
|
query_document,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.reference import (
|
||||||
|
ContentUnit,
|
||||||
|
ReferenceAddress,
|
||||||
|
ReferenceContext,
|
||||||
|
ReferenceResolution,
|
||||||
|
ReferenceResolutionError,
|
||||||
|
SourceSpan as ReferenceSourceSpan,
|
||||||
|
load_namespaces,
|
||||||
|
parse_reference,
|
||||||
|
resolve_reference,
|
||||||
|
)
|
||||||
from markitect_tool.schema import (
|
from markitect_tool.schema import (
|
||||||
MarkdownSchema,
|
MarkdownSchema,
|
||||||
SchemaValidationResult,
|
SchemaValidationResult,
|
||||||
@@ -109,8 +162,23 @@ __all__ = [
|
|||||||
"load_cache",
|
"load_cache",
|
||||||
"save_cache",
|
"save_cache",
|
||||||
"scan_markdown_files",
|
"scan_markdown_files",
|
||||||
|
"ClassCompositionResult",
|
||||||
|
"ContentClass",
|
||||||
|
"ContentClassRegistry",
|
||||||
|
"ContentClassResolutionError",
|
||||||
|
"load_content_class_file",
|
||||||
|
"load_content_classes",
|
||||||
"Diagnostic",
|
"Diagnostic",
|
||||||
"SourceLocation",
|
"SourceLocation",
|
||||||
|
"EXPLODE_MANIFEST_NAME",
|
||||||
|
"ExplodeEntry",
|
||||||
|
"ExplodeError",
|
||||||
|
"ExplodeManifest",
|
||||||
|
"ExplodeResult",
|
||||||
|
"ImplodeResult",
|
||||||
|
"explode_markdown_file",
|
||||||
|
"implode_markdown_directory",
|
||||||
|
"load_explode_manifest",
|
||||||
"GeneratedDocument",
|
"GeneratedDocument",
|
||||||
"GenerationHookRequest",
|
"GenerationHookRequest",
|
||||||
"GenerationHookResult",
|
"GenerationHookResult",
|
||||||
@@ -120,17 +188,45 @@ __all__ = [
|
|||||||
"generate_with_hook",
|
"generate_with_hook",
|
||||||
"load_generation_plan_file",
|
"load_generation_plan_file",
|
||||||
"run_generation_plan",
|
"run_generation_plan",
|
||||||
|
"CodeChunk",
|
||||||
|
"LiterateFile",
|
||||||
|
"TangleResult",
|
||||||
|
"WeaveResult",
|
||||||
|
"discover_code_chunks",
|
||||||
|
"tangle_markdown",
|
||||||
|
"weave_markdown",
|
||||||
|
"write_tangle_files",
|
||||||
"ComposeResult",
|
"ComposeResult",
|
||||||
"IncludeError",
|
"IncludeError",
|
||||||
"IncludeResult",
|
"IncludeResult",
|
||||||
|
"OperationProvenance",
|
||||||
"TransformResult",
|
"TransformResult",
|
||||||
"compose_files",
|
"compose_files",
|
||||||
"resolve_includes",
|
"resolve_includes",
|
||||||
"transform_markdown",
|
"transform_markdown",
|
||||||
|
"FencedProcessorBlock",
|
||||||
|
"ProcessorContext",
|
||||||
|
"ProcessorOutputFile",
|
||||||
|
"ProcessorRegistry",
|
||||||
|
"ProcessorRequest",
|
||||||
|
"ProcessorResult",
|
||||||
|
"ProcessorRun",
|
||||||
|
"default_processor_registry",
|
||||||
|
"discover_fenced_processors",
|
||||||
|
"run_fenced_processors",
|
||||||
"InvalidQueryError",
|
"InvalidQueryError",
|
||||||
"QueryMatch",
|
"QueryMatch",
|
||||||
"extract_document",
|
"extract_document",
|
||||||
"query_document",
|
"query_document",
|
||||||
|
"ContentUnit",
|
||||||
|
"ReferenceAddress",
|
||||||
|
"ReferenceContext",
|
||||||
|
"ReferenceResolution",
|
||||||
|
"ReferenceResolutionError",
|
||||||
|
"ReferenceSourceSpan",
|
||||||
|
"load_namespaces",
|
||||||
|
"parse_reference",
|
||||||
|
"resolve_reference",
|
||||||
"MissingTemplateVariable",
|
"MissingTemplateVariable",
|
||||||
"TemplateAnalysis",
|
"TemplateAnalysis",
|
||||||
"TemplateError",
|
"TemplateError",
|
||||||
|
|||||||
@@ -16,6 +16,10 @@ from markitect_tool.cache import (
|
|||||||
load_cache,
|
load_cache,
|
||||||
save_cache,
|
save_cache,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.content_class import (
|
||||||
|
ContentClassResolutionError,
|
||||||
|
load_content_class_file,
|
||||||
|
)
|
||||||
from markitect_tool.core import parse_markdown_file
|
from markitect_tool.core import parse_markdown_file
|
||||||
from markitect_tool.contract import (
|
from markitect_tool.contract import (
|
||||||
ContractLoaderError,
|
ContractLoaderError,
|
||||||
@@ -24,6 +28,11 @@ from markitect_tool.contract import (
|
|||||||
load_contract_file,
|
load_contract_file,
|
||||||
validate_contract,
|
validate_contract,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.explode import (
|
||||||
|
ExplodeError,
|
||||||
|
explode_markdown_file,
|
||||||
|
implode_markdown_directory,
|
||||||
|
)
|
||||||
from markitect_tool.generation import (
|
from markitect_tool.generation import (
|
||||||
GenerationPlanError,
|
GenerationPlanError,
|
||||||
generate_stub_from_contract,
|
generate_stub_from_contract,
|
||||||
@@ -31,8 +40,16 @@ from markitect_tool.generation import (
|
|||||||
load_generation_plan_file,
|
load_generation_plan_file,
|
||||||
run_generation_plan,
|
run_generation_plan,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.literate import tangle_markdown, weave_markdown, write_tangle_files
|
||||||
from markitect_tool.ops import IncludeError, compose_files, resolve_includes, transform_markdown
|
from markitect_tool.ops import IncludeError, compose_files, resolve_includes, transform_markdown
|
||||||
|
from markitect_tool.processor import ProcessorContext, run_fenced_processors
|
||||||
from markitect_tool.query import InvalidQueryError, extract_document, query_document
|
from markitect_tool.query import InvalidQueryError, extract_document, query_document
|
||||||
|
from markitect_tool.reference import (
|
||||||
|
ReferenceContext,
|
||||||
|
ReferenceResolutionError,
|
||||||
|
load_namespaces,
|
||||||
|
resolve_reference,
|
||||||
|
)
|
||||||
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
||||||
from markitect_tool.template import (
|
from markitect_tool.template import (
|
||||||
MissingTemplateVariable,
|
MissingTemplateVariable,
|
||||||
@@ -296,6 +313,224 @@ def include(
|
|||||||
_emit_markdown_result(result.to_dict(), output_format, output)
|
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--output-dir",
|
||||||
|
required=True,
|
||||||
|
type=click.Path(file_okay=False, path_type=Path),
|
||||||
|
help="Directory to write exploded Markdown files and manifest into.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--variant",
|
||||||
|
type=click.Choice(["flat", "hierarchical"], case_sensitive=False),
|
||||||
|
default="flat",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
@click.option("--force", is_flag=True, help="Allow writing into a non-empty output directory.")
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def explode(
|
||||||
|
file: Path,
|
||||||
|
output_dir: Path,
|
||||||
|
variant: str,
|
||||||
|
force: bool,
|
||||||
|
output_format: str,
|
||||||
|
) -> None:
|
||||||
|
"""Explode a Markdown file into reversible section files."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = explode_markdown_file(file, output_dir, variant=variant, overwrite=force)
|
||||||
|
except ExplodeError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
_emit_explode_result(result.to_dict(), output_format)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("directory", type=click.Path(exists=True, file_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--manifest",
|
||||||
|
"manifest_path",
|
||||||
|
type=click.Path(exists=True, dir_okay=False, path_type=Path),
|
||||||
|
help="Manifest path. Defaults to markitect-explode.yaml in the input directory.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--output",
|
||||||
|
type=click.Path(dir_okay=False, path_type=Path),
|
||||||
|
help="Write imploded Markdown to a file.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
|
||||||
|
default="markdown",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def implode(
|
||||||
|
directory: Path,
|
||||||
|
manifest_path: Path | None,
|
||||||
|
output: Path | None,
|
||||||
|
output_format: str,
|
||||||
|
) -> None:
|
||||||
|
"""Implode a Markdown directory created by `mkt explode`."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = implode_markdown_directory(directory, manifest_path=manifest_path)
|
||||||
|
except ExplodeError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||||
|
|
||||||
|
|
||||||
|
@main.group("ref")
|
||||||
|
def ref_group() -> None:
|
||||||
|
"""Resolve namespaced Markdown content references."""
|
||||||
|
|
||||||
|
|
||||||
|
@ref_group.command("resolve")
|
||||||
|
@click.argument("context_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.argument("reference")
|
||||||
|
@click.option(
|
||||||
|
"--root",
|
||||||
|
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||||
|
default=Path("."),
|
||||||
|
show_default=True,
|
||||||
|
help="Root that relative paths and namespaces must stay within.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def ref_resolve(context_file: Path, reference: str, root: Path, output_format: str) -> None:
|
||||||
|
"""Resolve a content reference using a Markdown document as context."""
|
||||||
|
|
||||||
|
context_document = parse_markdown_file(context_file)
|
||||||
|
context = ReferenceContext.from_document(
|
||||||
|
context_document,
|
||||||
|
root=root,
|
||||||
|
current_path=context_file,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
resolution = resolve_reference(reference, context=context)
|
||||||
|
except ReferenceResolutionError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
_emit_reference_result(resolution.to_dict(), output_format)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command("process")
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--root",
|
||||||
|
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||||
|
default=Path("."),
|
||||||
|
show_default=True,
|
||||||
|
help="Root used for relative processor references.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def process(file: Path, root: Path, output_format: str) -> None:
|
||||||
|
"""Run deterministic fenced-block processors in a Markdown file."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
context = ProcessorContext(
|
||||||
|
root=root,
|
||||||
|
current_path=file,
|
||||||
|
namespaces=load_namespaces(document.frontmatter),
|
||||||
|
)
|
||||||
|
result = run_fenced_processors(
|
||||||
|
file.read_text(encoding="utf-8"),
|
||||||
|
context=context,
|
||||||
|
source_path=file,
|
||||||
|
)
|
||||||
|
_emit_processor_run(result.to_dict(), output_format)
|
||||||
|
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||||
|
|
||||||
|
|
||||||
|
@main.group("class")
|
||||||
|
def class_group() -> None:
|
||||||
|
"""Resolve deterministic content classes."""
|
||||||
|
|
||||||
|
|
||||||
|
@class_group.command("resolve")
|
||||||
|
@click.argument("class_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.argument("class_name")
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def class_resolve(class_file: Path, class_name: str, output_format: str) -> None:
|
||||||
|
"""Resolve content class inheritance and merged slots."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
registry = load_content_class_file(class_file)
|
||||||
|
result = registry.compose(class_name)
|
||||||
|
except ContentClassResolutionError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
_emit_content_class_result(result.to_dict(), output_format)
|
||||||
|
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--output-dir",
|
||||||
|
type=click.Path(file_okay=False, path_type=Path),
|
||||||
|
help="Write tangled files under this directory. Omit for dry JSON/YAML/text output.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def tangle(file: Path, output_dir: Path | None, output_format: str) -> None:
|
||||||
|
"""Tangle named Markdown code chunks into target files."""
|
||||||
|
|
||||||
|
result = tangle_markdown(file.read_text(encoding="utf-8"), source_path=file)
|
||||||
|
data = result.to_dict()
|
||||||
|
if output_dir and result.valid:
|
||||||
|
data["written_files"] = write_tangle_files(result, output_dir)
|
||||||
|
_emit_tangle_result(data, output_format)
|
||||||
|
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--output",
|
||||||
|
type=click.Path(dir_okay=False, path_type=Path),
|
||||||
|
help="Write woven Markdown to a file.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["markdown", "json", "yaml"], case_sensitive=False),
|
||||||
|
default="markdown",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def weave(file: Path, output: Path | None, output_format: str) -> None:
|
||||||
|
"""Weave Markdown documentation with a deterministic chunk index."""
|
||||||
|
|
||||||
|
result = weave_markdown(file.read_text(encoding="utf-8"), source_path=file)
|
||||||
|
_emit_markdown_result(result.to_dict(), output_format, output)
|
||||||
|
|
||||||
|
|
||||||
@main.group()
|
@main.group()
|
||||||
def cache() -> None:
|
def cache() -> None:
|
||||||
"""Fingerprint Markdown files and detect changed inputs."""
|
"""Fingerprint Markdown files and detect changed inputs."""
|
||||||
@@ -788,6 +1023,83 @@ def _emit_cache_data(data: dict, output_format: str) -> None:
|
|||||||
click.echo(f"written: {data['written']}")
|
click.echo(f"written: {data['written']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_reference_result(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo(f"{data['count']} unit(s)")
|
||||||
|
click.echo(f"target: {data['target_path']}")
|
||||||
|
for unit in data["units"]:
|
||||||
|
span = unit.get("span", {})
|
||||||
|
line = f":{span['line_start']}" if span.get("line_start") else ""
|
||||||
|
click.echo(f"- {unit['kind']} {unit['unit_id']} {unit['source_path']}{line}")
|
||||||
|
if unit.get("name"):
|
||||||
|
click.echo(f" {unit['name']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_explode_result(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
manifest = data["manifest"]
|
||||||
|
click.echo(f"manifest: {data['manifest_path']}")
|
||||||
|
click.echo(f"variant: {manifest['variant']}")
|
||||||
|
click.echo(f"entries: {len(manifest['entries'])}")
|
||||||
|
for entry in manifest["entries"]:
|
||||||
|
click.echo(f"- {entry['kind']} {entry['file']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_processor_run(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo("valid" if data["valid"] else "invalid")
|
||||||
|
click.echo(f"processors: {data['count']}")
|
||||||
|
for block, result in zip(data["blocks"], data["results"], strict=False):
|
||||||
|
line = f":{block['line_start']}" if block.get("line_start") else ""
|
||||||
|
click.echo(f"- {block['processor']} {block['unit_id']}{line}")
|
||||||
|
if result.get("content"):
|
||||||
|
click.echo(f" content: {result['content'].splitlines()[0]}")
|
||||||
|
for diagnostic in result.get("diagnostics", []):
|
||||||
|
click.echo(f" [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_content_class_result(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo("valid" if data["valid"] else "invalid")
|
||||||
|
click.echo("linearization: " + " -> ".join(data["linearization"]))
|
||||||
|
for slot, value in data.get("slots", {}).items():
|
||||||
|
click.echo(f"- {slot}: {value}")
|
||||||
|
for diagnostic in data.get("diagnostics", []):
|
||||||
|
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_tangle_result(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo("valid" if data["valid"] else "invalid")
|
||||||
|
click.echo(f"files: {len(data['files'])}")
|
||||||
|
for file in data["files"]:
|
||||||
|
click.echo(f"- {file['path']}: {', '.join(file['chunk_ids'])}")
|
||||||
|
for diagnostic in data.get("diagnostics", []):
|
||||||
|
click.echo(f"! [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
|
||||||
|
for written in data.get("written_files", []):
|
||||||
|
click.echo(f"written: {written}")
|
||||||
|
|
||||||
|
|
||||||
def _emit_jsonish(data: dict, output_format: str) -> None:
|
def _emit_jsonish(data: dict, output_format: str) -> None:
|
||||||
if output_format == "yaml":
|
if output_format == "yaml":
|
||||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
|||||||
19
src/markitect_tool/content_class/__init__.py
Normal file
19
src/markitect_tool/content_class/__init__.py
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
"""Deterministic content class composition."""
|
||||||
|
|
||||||
|
from markitect_tool.content_class.engine import (
|
||||||
|
ClassCompositionResult,
|
||||||
|
ContentClass,
|
||||||
|
ContentClassRegistry,
|
||||||
|
ContentClassResolutionError,
|
||||||
|
load_content_class_file,
|
||||||
|
load_content_classes,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ClassCompositionResult",
|
||||||
|
"ContentClass",
|
||||||
|
"ContentClassRegistry",
|
||||||
|
"ContentClassResolutionError",
|
||||||
|
"load_content_class_file",
|
||||||
|
"load_content_classes",
|
||||||
|
]
|
||||||
225
src/markitect_tool/content_class/engine.py
Normal file
225
src/markitect_tool/content_class/engine.py
Normal file
@@ -0,0 +1,225 @@
|
|||||||
|
"""Small deterministic content class resolver."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from copy import deepcopy
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from markitect_tool.diagnostics import Diagnostic
|
||||||
|
|
||||||
|
|
||||||
|
class ContentClassResolutionError(ValueError):
|
||||||
|
"""Raised when content class definitions cannot be loaded."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ContentClass:
|
||||||
|
"""A data-defined content class."""
|
||||||
|
|
||||||
|
name: str
|
||||||
|
extends: list[str] = field(default_factory=list)
|
||||||
|
slots: dict[str, Any] = field(default_factory=dict)
|
||||||
|
merge_policies: dict[str, str] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {key: value for key, value in asdict(self).items() if value not in ({}, [], None)}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ClassCompositionResult:
|
||||||
|
"""Resolved content class slots plus diagnostics."""
|
||||||
|
|
||||||
|
class_name: str
|
||||||
|
linearization: list[str]
|
||||||
|
slots: dict[str, Any]
|
||||||
|
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def valid(self) -> bool:
|
||||||
|
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"valid": self.valid,
|
||||||
|
"class_name": self.class_name,
|
||||||
|
"linearization": self.linearization,
|
||||||
|
"slots": self.slots,
|
||||||
|
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ContentClassRegistry:
|
||||||
|
"""Registry and resolver for content classes."""
|
||||||
|
|
||||||
|
def __init__(self, classes: dict[str, ContentClass] | None = None) -> None:
|
||||||
|
self.classes = classes or {}
|
||||||
|
|
||||||
|
def add(self, content_class: ContentClass) -> None:
|
||||||
|
self.classes[content_class.name] = content_class
|
||||||
|
|
||||||
|
def linearize(self, class_name: str) -> list[str]:
|
||||||
|
if class_name not in self.classes:
|
||||||
|
raise ContentClassResolutionError(f"Unknown content class `{class_name}`")
|
||||||
|
return self._linearize(class_name, [])
|
||||||
|
|
||||||
|
def compose(self, class_name: str) -> ClassCompositionResult:
|
||||||
|
diagnostics: list[Diagnostic] = []
|
||||||
|
try:
|
||||||
|
linearization = self.linearize(class_name)
|
||||||
|
except ContentClassResolutionError as exc:
|
||||||
|
return ClassCompositionResult(
|
||||||
|
class_name=class_name,
|
||||||
|
linearization=[],
|
||||||
|
slots={},
|
||||||
|
diagnostics=[
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="content_class.resolution_error",
|
||||||
|
message=str(exc),
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
slots: dict[str, Any] = {}
|
||||||
|
for name in reversed(linearization):
|
||||||
|
content_class = self.classes[name]
|
||||||
|
for slot, value in content_class.slots.items():
|
||||||
|
policy = content_class.merge_policies.get(slot, "replace")
|
||||||
|
try:
|
||||||
|
slots[slot] = _merge_slot(slots.get(slot), value, policy)
|
||||||
|
except ContentClassResolutionError as exc:
|
||||||
|
diagnostics.append(
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="content_class.merge_conflict",
|
||||||
|
message=str(exc),
|
||||||
|
details={"class": name, "slot": slot, "policy": policy},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return ClassCompositionResult(
|
||||||
|
class_name=class_name,
|
||||||
|
linearization=linearization,
|
||||||
|
slots=slots,
|
||||||
|
diagnostics=diagnostics,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _linearize(self, class_name: str, stack: list[str]) -> list[str]:
|
||||||
|
if class_name in stack:
|
||||||
|
raise ContentClassResolutionError(
|
||||||
|
"Cyclic content class inheritance: " + " -> ".join(stack + [class_name])
|
||||||
|
)
|
||||||
|
content_class = self.classes[class_name]
|
||||||
|
parent_mros = [
|
||||||
|
self._linearize(parent, stack + [class_name])
|
||||||
|
for parent in content_class.extends
|
||||||
|
if _known_parent(parent, self.classes)
|
||||||
|
]
|
||||||
|
missing = [parent for parent in content_class.extends if parent not in self.classes]
|
||||||
|
if missing:
|
||||||
|
raise ContentClassResolutionError(
|
||||||
|
f"Content class `{class_name}` extends unknown class(es): {', '.join(missing)}"
|
||||||
|
)
|
||||||
|
return [class_name] + _c3_merge(parent_mros + [list(content_class.extends)])
|
||||||
|
|
||||||
|
|
||||||
|
def load_content_class_file(path: str | Path) -> ContentClassRegistry:
|
||||||
|
"""Load content class definitions from YAML."""
|
||||||
|
|
||||||
|
data = yaml.safe_load(Path(path).read_text(encoding="utf-8"))
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
raise ContentClassResolutionError("Content class file must be a mapping")
|
||||||
|
return load_content_classes(data)
|
||||||
|
|
||||||
|
|
||||||
|
def load_content_classes(data: dict[str, Any]) -> ContentClassRegistry:
|
||||||
|
"""Load content class definitions from a mapping."""
|
||||||
|
|
||||||
|
raw_classes = data.get("classes", data)
|
||||||
|
if not isinstance(raw_classes, dict):
|
||||||
|
raise ContentClassResolutionError("Content classes must be a mapping")
|
||||||
|
classes: dict[str, ContentClass] = {}
|
||||||
|
for name, raw_class in raw_classes.items():
|
||||||
|
if not isinstance(raw_class, dict):
|
||||||
|
raise ContentClassResolutionError(f"Content class `{name}` must be a mapping")
|
||||||
|
extends = raw_class.get("extends", [])
|
||||||
|
if isinstance(extends, str):
|
||||||
|
extends = [extends]
|
||||||
|
if not isinstance(extends, list):
|
||||||
|
raise ContentClassResolutionError(f"Content class `{name}` extends must be a list")
|
||||||
|
slots = raw_class.get("slots", {})
|
||||||
|
policies = raw_class.get("merge_policies", {})
|
||||||
|
if not isinstance(slots, dict) or not isinstance(policies, dict):
|
||||||
|
raise ContentClassResolutionError(
|
||||||
|
f"Content class `{name}` slots and merge_policies must be mappings"
|
||||||
|
)
|
||||||
|
classes[str(name)] = ContentClass(
|
||||||
|
name=str(name),
|
||||||
|
extends=[str(parent) for parent in extends],
|
||||||
|
slots=slots,
|
||||||
|
merge_policies={str(key): str(value) for key, value in policies.items()},
|
||||||
|
)
|
||||||
|
return ContentClassRegistry(classes)
|
||||||
|
|
||||||
|
|
||||||
|
def _c3_merge(sequences: list[list[str]]) -> list[str]:
|
||||||
|
result: list[str] = []
|
||||||
|
sequences = [list(sequence) for sequence in sequences if sequence]
|
||||||
|
while sequences:
|
||||||
|
candidate = None
|
||||||
|
for sequence in sequences:
|
||||||
|
head = sequence[0]
|
||||||
|
if not any(head in other[1:] for other in sequences):
|
||||||
|
candidate = head
|
||||||
|
break
|
||||||
|
if candidate is None:
|
||||||
|
raise ContentClassResolutionError("Inconsistent content class precedence order")
|
||||||
|
result.append(candidate)
|
||||||
|
sequences = [
|
||||||
|
[item for item in sequence if item != candidate]
|
||||||
|
for sequence in sequences
|
||||||
|
]
|
||||||
|
sequences = [sequence for sequence in sequences if sequence]
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_slot(existing: Any, value: Any, policy: str) -> Any:
|
||||||
|
incoming = deepcopy(value)
|
||||||
|
if existing is None:
|
||||||
|
return incoming
|
||||||
|
if policy == "replace":
|
||||||
|
return incoming
|
||||||
|
if policy == "append":
|
||||||
|
return _as_list(existing) + _as_list(incoming)
|
||||||
|
if policy == "prepend":
|
||||||
|
return _as_list(incoming) + _as_list(existing)
|
||||||
|
if policy == "deep_merge":
|
||||||
|
if not isinstance(existing, dict) or not isinstance(incoming, dict):
|
||||||
|
raise ContentClassResolutionError("deep_merge requires mapping values")
|
||||||
|
return _deep_merge(existing, incoming)
|
||||||
|
if policy == "error_on_conflict":
|
||||||
|
if existing != incoming:
|
||||||
|
raise ContentClassResolutionError("slot conflict")
|
||||||
|
return existing
|
||||||
|
raise ContentClassResolutionError(f"Unknown merge policy `{policy}`")
|
||||||
|
|
||||||
|
|
||||||
|
def _deep_merge(left: dict[str, Any], right: dict[str, Any]) -> dict[str, Any]:
|
||||||
|
merged = deepcopy(left)
|
||||||
|
for key, value in right.items():
|
||||||
|
if isinstance(merged.get(key), dict) and isinstance(value, dict):
|
||||||
|
merged[key] = _deep_merge(merged[key], value)
|
||||||
|
else:
|
||||||
|
merged[key] = deepcopy(value)
|
||||||
|
return merged
|
||||||
|
|
||||||
|
|
||||||
|
def _as_list(value: Any) -> list[Any]:
|
||||||
|
return value if isinstance(value, list) else [value]
|
||||||
|
|
||||||
|
|
||||||
|
def _known_parent(parent: str, classes: dict[str, ContentClass]) -> bool:
|
||||||
|
return parent in classes
|
||||||
25
src/markitect_tool/explode/__init__.py
Normal file
25
src/markitect_tool/explode/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
"""Reversible explode/implode operations for Markdown documents."""
|
||||||
|
|
||||||
|
from markitect_tool.explode.engine import (
|
||||||
|
EXPLODE_MANIFEST_NAME,
|
||||||
|
ExplodeEntry,
|
||||||
|
ExplodeError,
|
||||||
|
ExplodeManifest,
|
||||||
|
ExplodeResult,
|
||||||
|
ImplodeResult,
|
||||||
|
explode_markdown_file,
|
||||||
|
implode_markdown_directory,
|
||||||
|
load_explode_manifest,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"EXPLODE_MANIFEST_NAME",
|
||||||
|
"ExplodeEntry",
|
||||||
|
"ExplodeError",
|
||||||
|
"ExplodeManifest",
|
||||||
|
"ExplodeResult",
|
||||||
|
"ImplodeResult",
|
||||||
|
"explode_markdown_file",
|
||||||
|
"implode_markdown_directory",
|
||||||
|
"load_explode_manifest",
|
||||||
|
]
|
||||||
324
src/markitect_tool/explode/engine.py
Normal file
324
src/markitect_tool/explode/engine.py
Normal file
@@ -0,0 +1,324 @@
|
|||||||
|
"""Manifest-first reversible explode/implode for Markdown files."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import re
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from markitect_tool.core import Heading, parse_markdown
|
||||||
|
|
||||||
|
|
||||||
|
EXPLODE_MANIFEST_NAME = "markitect-explode.yaml"
|
||||||
|
|
||||||
|
|
||||||
|
class ExplodeError(ValueError):
|
||||||
|
"""Raised when explode or implode cannot preserve a safe roundtrip."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ExplodeEntry:
|
||||||
|
"""One file entry in an exploded Markdown directory."""
|
||||||
|
|
||||||
|
kind: str
|
||||||
|
file: str
|
||||||
|
order: int
|
||||||
|
unit_id: str
|
||||||
|
line_start: int
|
||||||
|
line_end: int
|
||||||
|
heading_level: int | None = None
|
||||||
|
heading_text: str | None = None
|
||||||
|
content_hash: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {key: value for key, value in asdict(self).items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ExplodeManifest:
|
||||||
|
"""Manifest used to implode an exploded Markdown directory."""
|
||||||
|
|
||||||
|
version: int
|
||||||
|
source_path: str
|
||||||
|
source_hash: str
|
||||||
|
variant: str
|
||||||
|
frontmatter_raw: str = ""
|
||||||
|
entries: list[ExplodeEntry] = field(default_factory=list)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"version": self.version,
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"source_hash": self.source_hash,
|
||||||
|
"variant": self.variant,
|
||||||
|
"frontmatter_raw": self.frontmatter_raw,
|
||||||
|
"entries": [entry.to_dict() for entry in self.entries],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ExplodeResult:
|
||||||
|
"""Result of exploding a Markdown file into a directory."""
|
||||||
|
|
||||||
|
manifest_path: str
|
||||||
|
output_dir: str
|
||||||
|
manifest: ExplodeManifest
|
||||||
|
written_files: list[str]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"manifest_path": self.manifest_path,
|
||||||
|
"output_dir": self.output_dir,
|
||||||
|
"manifest": self.manifest.to_dict(),
|
||||||
|
"written_files": self.written_files,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ImplodeResult:
|
||||||
|
"""Result of rebuilding Markdown from an explode manifest."""
|
||||||
|
|
||||||
|
markdown: str
|
||||||
|
manifest_path: str
|
||||||
|
source_hash: str
|
||||||
|
current_hash: str
|
||||||
|
entries: list[str]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
def explode_markdown_file(
|
||||||
|
path: str | Path,
|
||||||
|
output_dir: str | Path,
|
||||||
|
*,
|
||||||
|
variant: str = "flat",
|
||||||
|
overwrite: bool = False,
|
||||||
|
) -> ExplodeResult:
|
||||||
|
"""Explode a Markdown file into section files plus a roundtrip manifest."""
|
||||||
|
|
||||||
|
if variant not in {"flat", "hierarchical"}:
|
||||||
|
raise ExplodeError("Explode variant must be `flat` or `hierarchical`")
|
||||||
|
|
||||||
|
source_path = Path(path)
|
||||||
|
target_dir = Path(output_dir)
|
||||||
|
markdown = source_path.read_text(encoding="utf-8")
|
||||||
|
if target_dir.exists() and any(target_dir.iterdir()) and not overwrite:
|
||||||
|
raise ExplodeError(f"Output directory is not empty: {target_dir}")
|
||||||
|
target_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
frontmatter_raw, body_start_line = _split_frontmatter_raw(markdown)
|
||||||
|
entries_with_text = _explode_entries(markdown, body_start_line, variant)
|
||||||
|
written_files: list[str] = []
|
||||||
|
entries: list[ExplodeEntry] = []
|
||||||
|
|
||||||
|
for entry, text in entries_with_text:
|
||||||
|
entry_path = _safe_entry_path(target_dir, entry.file)
|
||||||
|
entry_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
entry_path.write_text(text, encoding="utf-8")
|
||||||
|
written_files.append(str(entry_path))
|
||||||
|
entries.append(entry)
|
||||||
|
|
||||||
|
manifest = ExplodeManifest(
|
||||||
|
version=1,
|
||||||
|
source_path=str(source_path),
|
||||||
|
source_hash=_hash_text(markdown),
|
||||||
|
variant=variant,
|
||||||
|
frontmatter_raw=frontmatter_raw,
|
||||||
|
entries=entries,
|
||||||
|
)
|
||||||
|
manifest_path = target_dir / EXPLODE_MANIFEST_NAME
|
||||||
|
manifest_path.write_text(yaml.safe_dump(manifest.to_dict(), sort_keys=False), encoding="utf-8")
|
||||||
|
return ExplodeResult(
|
||||||
|
manifest_path=str(manifest_path),
|
||||||
|
output_dir=str(target_dir),
|
||||||
|
manifest=manifest,
|
||||||
|
written_files=written_files + [str(manifest_path)],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def implode_markdown_directory(
|
||||||
|
directory: str | Path,
|
||||||
|
*,
|
||||||
|
manifest_path: str | Path | None = None,
|
||||||
|
) -> ImplodeResult:
|
||||||
|
"""Implode a Markdown directory created by :func:`explode_markdown_file`."""
|
||||||
|
|
||||||
|
root = Path(directory)
|
||||||
|
manifest_file = Path(manifest_path) if manifest_path else root / EXPLODE_MANIFEST_NAME
|
||||||
|
manifest = load_explode_manifest(manifest_file)
|
||||||
|
parts = [manifest.frontmatter_raw]
|
||||||
|
entry_files: list[str] = []
|
||||||
|
|
||||||
|
for entry in manifest.entries:
|
||||||
|
entry_path = _safe_entry_path(root, entry.file)
|
||||||
|
if not entry_path.exists() or not entry_path.is_file():
|
||||||
|
raise ExplodeError(f"Exploded entry file not found: {entry.file}")
|
||||||
|
parts.append(entry_path.read_text(encoding="utf-8"))
|
||||||
|
entry_files.append(str(entry_path))
|
||||||
|
|
||||||
|
markdown = "".join(parts)
|
||||||
|
return ImplodeResult(
|
||||||
|
markdown=markdown,
|
||||||
|
manifest_path=str(manifest_file),
|
||||||
|
source_hash=manifest.source_hash,
|
||||||
|
current_hash=_hash_text(markdown),
|
||||||
|
entries=entry_files,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def load_explode_manifest(path: str | Path) -> ExplodeManifest:
|
||||||
|
"""Load an explode manifest from YAML."""
|
||||||
|
|
||||||
|
manifest_path = Path(path)
|
||||||
|
data = yaml.safe_load(manifest_path.read_text(encoding="utf-8"))
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
raise ExplodeError("Explode manifest must be a mapping")
|
||||||
|
entries = data.get("entries", [])
|
||||||
|
if not isinstance(entries, list):
|
||||||
|
raise ExplodeError("Explode manifest entries must be a list")
|
||||||
|
return ExplodeManifest(
|
||||||
|
version=int(data.get("version", 1)),
|
||||||
|
source_path=str(data.get("source_path", "")),
|
||||||
|
source_hash=str(data.get("source_hash", "")),
|
||||||
|
variant=str(data.get("variant", "flat")),
|
||||||
|
frontmatter_raw=str(data.get("frontmatter_raw", "")),
|
||||||
|
entries=[_entry_from_mapping(entry) for entry in entries],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _explode_entries(
|
||||||
|
markdown: str,
|
||||||
|
body_start_line: int,
|
||||||
|
variant: str,
|
||||||
|
) -> list[tuple[ExplodeEntry, str]]:
|
||||||
|
lines = markdown.splitlines(keepends=True)
|
||||||
|
headings = parse_markdown(markdown).headings
|
||||||
|
entries: list[tuple[ExplodeEntry, str]] = []
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
order = 0
|
||||||
|
|
||||||
|
first_heading_line = headings[0].line if headings else len(lines) + 1
|
||||||
|
preamble_text = "".join(lines[body_start_line - 1:first_heading_line - 1])
|
||||||
|
if preamble_text or not headings:
|
||||||
|
entry = ExplodeEntry(
|
||||||
|
kind="preamble",
|
||||||
|
file="00-preamble.md",
|
||||||
|
order=order,
|
||||||
|
unit_id="preamble",
|
||||||
|
line_start=body_start_line,
|
||||||
|
line_end=max(first_heading_line - 1, body_start_line),
|
||||||
|
content_hash=_hash_text(preamble_text),
|
||||||
|
)
|
||||||
|
entries.append((entry, preamble_text))
|
||||||
|
order += 1
|
||||||
|
|
||||||
|
hierarchy: dict[int, str] = {}
|
||||||
|
for index, heading in enumerate(headings):
|
||||||
|
start = heading.line
|
||||||
|
end = headings[index + 1].line - 1 if index + 1 < len(headings) else len(lines)
|
||||||
|
text = "".join(lines[start - 1:end])
|
||||||
|
unit_id = _dedupe_id(_slug(_heading_title(heading)), used_ids)
|
||||||
|
file_path = _entry_file_for_heading(heading, index + 1, unit_id, variant, hierarchy)
|
||||||
|
entry = ExplodeEntry(
|
||||||
|
kind="section",
|
||||||
|
file=file_path,
|
||||||
|
order=order,
|
||||||
|
unit_id=unit_id,
|
||||||
|
line_start=start,
|
||||||
|
line_end=end,
|
||||||
|
heading_level=heading.level,
|
||||||
|
heading_text=heading.text,
|
||||||
|
content_hash=_hash_text(text),
|
||||||
|
)
|
||||||
|
entries.append((entry, text))
|
||||||
|
order += 1
|
||||||
|
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def _entry_file_for_heading(
|
||||||
|
heading: Heading,
|
||||||
|
index: int,
|
||||||
|
unit_id: str,
|
||||||
|
variant: str,
|
||||||
|
hierarchy: dict[int, str],
|
||||||
|
) -> str:
|
||||||
|
filename = f"{index:02d}-{unit_id}.md"
|
||||||
|
if variant == "flat":
|
||||||
|
return f"sections/{filename}"
|
||||||
|
|
||||||
|
for level in list(hierarchy):
|
||||||
|
if level >= heading.level:
|
||||||
|
del hierarchy[level]
|
||||||
|
parents = [hierarchy[level] for level in sorted(hierarchy) if level < heading.level]
|
||||||
|
hierarchy[heading.level] = f"{index:02d}-{unit_id}"
|
||||||
|
return str(Path(*parents, filename)) if parents else filename
|
||||||
|
|
||||||
|
|
||||||
|
def _entry_from_mapping(data: Any) -> ExplodeEntry:
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
raise ExplodeError("Explode manifest entry must be a mapping")
|
||||||
|
return ExplodeEntry(
|
||||||
|
kind=str(data["kind"]),
|
||||||
|
file=str(data["file"]),
|
||||||
|
order=int(data["order"]),
|
||||||
|
unit_id=str(data["unit_id"]),
|
||||||
|
line_start=int(data["line_start"]),
|
||||||
|
line_end=int(data["line_end"]),
|
||||||
|
heading_level=int(data["heading_level"]) if data.get("heading_level") is not None else None,
|
||||||
|
heading_text=str(data["heading_text"]) if data.get("heading_text") is not None else None,
|
||||||
|
content_hash=str(data.get("content_hash", "")),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_entry_path(root: Path, relative_path: str) -> Path:
|
||||||
|
path = Path(relative_path)
|
||||||
|
if path.is_absolute():
|
||||||
|
raise ExplodeError(f"Exploded entry path must be relative: {relative_path}")
|
||||||
|
resolved = (root / path).resolve()
|
||||||
|
try:
|
||||||
|
resolved.relative_to(root.resolve())
|
||||||
|
except ValueError as exc:
|
||||||
|
raise ExplodeError(f"Exploded entry path escapes directory: {relative_path}") from exc
|
||||||
|
return resolved
|
||||||
|
|
||||||
|
|
||||||
|
def _split_frontmatter_raw(markdown: str) -> tuple[str, int]:
|
||||||
|
if not markdown.startswith("---\n"):
|
||||||
|
return "", 1
|
||||||
|
end = markdown.find("\n---", 4)
|
||||||
|
if end == -1:
|
||||||
|
return "", 1
|
||||||
|
closing_end = markdown.find("\n", end + 4)
|
||||||
|
if closing_end == -1:
|
||||||
|
closing_end = len(markdown)
|
||||||
|
else:
|
||||||
|
closing_end += 1
|
||||||
|
frontmatter_raw = markdown[:closing_end]
|
||||||
|
return frontmatter_raw, frontmatter_raw.count("\n") + 1
|
||||||
|
|
||||||
|
|
||||||
|
def _heading_title(heading: Heading) -> str:
|
||||||
|
text = re.sub(r"\s+\{#[A-Za-z0-9_.:-]+\}\s*$", "", heading.text.strip())
|
||||||
|
return text or "section"
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||||
|
count = used_ids.get(unit_id, 0) + 1
|
||||||
|
used_ids[unit_id] = count
|
||||||
|
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||||
|
|
||||||
|
|
||||||
|
def _slug(value: str) -> str:
|
||||||
|
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||||
|
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||||
|
return slug or "section"
|
||||||
|
|
||||||
|
|
||||||
|
def _hash_text(text: str) -> str:
|
||||||
|
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||||
23
src/markitect_tool/literate/__init__.py
Normal file
23
src/markitect_tool/literate/__init__.py
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
"""Markdown-native literate weave/tangle workflows."""
|
||||||
|
|
||||||
|
from markitect_tool.literate.engine import (
|
||||||
|
CodeChunk,
|
||||||
|
LiterateFile,
|
||||||
|
TangleResult,
|
||||||
|
WeaveResult,
|
||||||
|
discover_code_chunks,
|
||||||
|
tangle_markdown,
|
||||||
|
weave_markdown,
|
||||||
|
write_tangle_files,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"CodeChunk",
|
||||||
|
"LiterateFile",
|
||||||
|
"TangleResult",
|
||||||
|
"WeaveResult",
|
||||||
|
"discover_code_chunks",
|
||||||
|
"tangle_markdown",
|
||||||
|
"weave_markdown",
|
||||||
|
"write_tangle_files",
|
||||||
|
]
|
||||||
317
src/markitect_tool/literate/engine.py
Normal file
317
src/markitect_tool/literate/engine.py
Normal file
@@ -0,0 +1,317 @@
|
|||||||
|
"""Literate programming helpers for Markdown fenced code chunks."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import re
|
||||||
|
import shlex
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from markdown_it import MarkdownIt
|
||||||
|
|
||||||
|
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||||
|
from markitect_tool.ops import OperationProvenance
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class CodeChunk:
|
||||||
|
"""A named fenced code chunk."""
|
||||||
|
|
||||||
|
chunk_id: str
|
||||||
|
content: str
|
||||||
|
language: str | None = None
|
||||||
|
target_path: str | None = None
|
||||||
|
references: list[str] = field(default_factory=list)
|
||||||
|
source_path: str | None = None
|
||||||
|
line_start: int | None = None
|
||||||
|
line_end: int | None = None
|
||||||
|
content_hash: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {key: value for key, value in asdict(self).items() if value not in (None, [], "")}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class LiterateFile:
|
||||||
|
"""One generated file from tangling."""
|
||||||
|
|
||||||
|
path: str
|
||||||
|
content: str
|
||||||
|
chunk_ids: list[str]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class TangleResult:
|
||||||
|
"""Result of tangling Markdown code chunks."""
|
||||||
|
|
||||||
|
files: list[LiterateFile]
|
||||||
|
chunks: list[CodeChunk]
|
||||||
|
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||||
|
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def valid(self) -> bool:
|
||||||
|
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"valid": self.valid,
|
||||||
|
"files": [file.to_dict() for file in self.files],
|
||||||
|
"chunks": [chunk.to_dict() for chunk in self.chunks],
|
||||||
|
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||||
|
"provenance": [event.to_dict() for event in self.provenance],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class WeaveResult:
|
||||||
|
"""Result of weaving Markdown documentation with a chunk index."""
|
||||||
|
|
||||||
|
markdown: str
|
||||||
|
chunks: list[CodeChunk]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"markdown": self.markdown,
|
||||||
|
"chunks": [chunk.to_dict() for chunk in self.chunks],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
_CHUNK_REF_RE = re.compile(r"<<(?P<id>[A-Za-z0-9_.:-]+)>>")
|
||||||
|
_CHUNK_LINE_REF_RE = re.compile(r"^(?P<indent>[ \t]*)<<(?P<id>[A-Za-z0-9_.:-]+)>>[ \t]*$", re.MULTILINE)
|
||||||
|
|
||||||
|
|
||||||
|
def discover_code_chunks(
|
||||||
|
markdown: str,
|
||||||
|
*,
|
||||||
|
source_path: str | Path | None = None,
|
||||||
|
) -> list[CodeChunk]:
|
||||||
|
"""Discover named fenced code chunks in Markdown order."""
|
||||||
|
|
||||||
|
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||||
|
chunks: list[CodeChunk] = []
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
for token in parser.parse(markdown):
|
||||||
|
if token.type != "fence":
|
||||||
|
continue
|
||||||
|
attrs = _parse_fence_info(token.info)
|
||||||
|
chunk_id = attrs.get("id")
|
||||||
|
if not chunk_id:
|
||||||
|
continue
|
||||||
|
chunk_id = _dedupe_id(_slug(chunk_id), used_ids)
|
||||||
|
line_start = token.map[0] + 1 if token.map else None
|
||||||
|
line_end = token.map[1] if token.map else None
|
||||||
|
chunks.append(
|
||||||
|
CodeChunk(
|
||||||
|
chunk_id=chunk_id,
|
||||||
|
content=token.content,
|
||||||
|
language=attrs.get("language"),
|
||||||
|
target_path=attrs.get("tangle") or attrs.get("target"),
|
||||||
|
references=_chunk_references(token.content),
|
||||||
|
source_path=str(source_path) if source_path else None,
|
||||||
|
line_start=line_start,
|
||||||
|
line_end=line_end,
|
||||||
|
content_hash=_hash_text(token.content),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return chunks
|
||||||
|
|
||||||
|
|
||||||
|
def tangle_markdown(
|
||||||
|
markdown: str,
|
||||||
|
*,
|
||||||
|
source_path: str | Path | None = None,
|
||||||
|
) -> TangleResult:
|
||||||
|
"""Tangle named chunks into target files."""
|
||||||
|
|
||||||
|
chunks = discover_code_chunks(markdown, source_path=source_path)
|
||||||
|
chunks_by_id = {chunk.chunk_id: chunk for chunk in chunks}
|
||||||
|
diagnostics: list[Diagnostic] = []
|
||||||
|
provenance: list[OperationProvenance] = []
|
||||||
|
target_chunks: dict[str, list[CodeChunk]] = {}
|
||||||
|
for chunk in chunks:
|
||||||
|
if chunk.target_path:
|
||||||
|
target_chunks.setdefault(chunk.target_path, []).append(chunk)
|
||||||
|
|
||||||
|
files: list[LiterateFile] = []
|
||||||
|
for target_path, grouped_chunks in target_chunks.items():
|
||||||
|
rendered_parts: list[str] = []
|
||||||
|
for chunk in grouped_chunks:
|
||||||
|
rendered_parts.append(_expand_chunk(chunk, chunks_by_id, diagnostics, []))
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="literate.tangle",
|
||||||
|
source_path=chunk.source_path,
|
||||||
|
line_start=chunk.line_start,
|
||||||
|
line_end=chunk.line_end,
|
||||||
|
target_path=target_path,
|
||||||
|
dependencies=[chunk.source_path] if chunk.source_path else [],
|
||||||
|
metadata={"chunk_id": chunk.chunk_id, "references": chunk.references},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
files.append(
|
||||||
|
LiterateFile(
|
||||||
|
path=target_path,
|
||||||
|
content=_join_tangled_parts(rendered_parts),
|
||||||
|
chunk_ids=[chunk.chunk_id for chunk in grouped_chunks],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return TangleResult(
|
||||||
|
files=files,
|
||||||
|
chunks=chunks,
|
||||||
|
diagnostics=diagnostics,
|
||||||
|
provenance=provenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def weave_markdown(
|
||||||
|
markdown: str,
|
||||||
|
*,
|
||||||
|
source_path: str | Path | None = None,
|
||||||
|
) -> WeaveResult:
|
||||||
|
"""Append a deterministic chunk index to human-readable Markdown."""
|
||||||
|
|
||||||
|
chunks = discover_code_chunks(markdown, source_path=source_path)
|
||||||
|
if not chunks:
|
||||||
|
return WeaveResult(markdown=markdown, chunks=[])
|
||||||
|
|
||||||
|
lines = [markdown.rstrip(), "", "## Code Chunk Index", ""]
|
||||||
|
for chunk in chunks:
|
||||||
|
target = f" -> `{chunk.target_path}`" if chunk.target_path else ""
|
||||||
|
refs = f"; refs: {', '.join(f'`{ref}`' for ref in chunk.references)}" if chunk.references else ""
|
||||||
|
location = f" line {chunk.line_start}" if chunk.line_start else ""
|
||||||
|
lines.append(f"- `{chunk.chunk_id}`{target}{refs}{location}")
|
||||||
|
return WeaveResult(markdown="\n".join(lines).rstrip() + "\n", chunks=chunks)
|
||||||
|
|
||||||
|
|
||||||
|
def write_tangle_files(result: TangleResult, output_dir: str | Path) -> list[str]:
|
||||||
|
"""Write tangled files under an output directory."""
|
||||||
|
|
||||||
|
root = Path(output_dir)
|
||||||
|
root.mkdir(parents=True, exist_ok=True)
|
||||||
|
written: list[str] = []
|
||||||
|
for file in result.files:
|
||||||
|
target = _safe_output_path(root, file.path)
|
||||||
|
target.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
target.write_text(file.content, encoding="utf-8")
|
||||||
|
written.append(str(target))
|
||||||
|
return written
|
||||||
|
|
||||||
|
|
||||||
|
def _expand_chunk(
|
||||||
|
chunk: CodeChunk,
|
||||||
|
chunks_by_id: dict[str, CodeChunk],
|
||||||
|
diagnostics: list[Diagnostic],
|
||||||
|
stack: list[str],
|
||||||
|
) -> str:
|
||||||
|
if chunk.chunk_id in stack:
|
||||||
|
diagnostics.append(
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="literate.chunk_cycle",
|
||||||
|
message="Cyclic chunk reference: " + " -> ".join(stack + [chunk.chunk_id]),
|
||||||
|
source=SourceLocation(path=chunk.source_path, line=chunk.line_start),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return f"<<{chunk.chunk_id}>>"
|
||||||
|
|
||||||
|
def replace_line(match: re.Match[str]) -> str:
|
||||||
|
indent = match.group("indent")
|
||||||
|
expanded = _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
|
||||||
|
return "\n".join(f"{indent}{line}" if line else line for line in expanded.splitlines())
|
||||||
|
|
||||||
|
rendered = _CHUNK_LINE_REF_RE.sub(replace_line, chunk.content)
|
||||||
|
|
||||||
|
def replace_inline(match: re.Match[str]) -> str:
|
||||||
|
return _expand_reference(match.group("id"), chunks_by_id, diagnostics, stack + [chunk.chunk_id], chunk)
|
||||||
|
|
||||||
|
return _CHUNK_REF_RE.sub(replace_inline, rendered)
|
||||||
|
|
||||||
|
|
||||||
|
def _expand_reference(
|
||||||
|
chunk_id: str,
|
||||||
|
chunks_by_id: dict[str, CodeChunk],
|
||||||
|
diagnostics: list[Diagnostic],
|
||||||
|
stack: list[str],
|
||||||
|
source_chunk: CodeChunk,
|
||||||
|
) -> str:
|
||||||
|
referenced = chunks_by_id.get(chunk_id)
|
||||||
|
if not referenced:
|
||||||
|
diagnostics.append(
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="literate.missing_chunk",
|
||||||
|
message=f"Missing chunk reference `{chunk_id}`",
|
||||||
|
source=SourceLocation(path=source_chunk.source_path, line=source_chunk.line_start),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return f"<<{chunk_id}>>"
|
||||||
|
return _expand_chunk(referenced, chunks_by_id, diagnostics, stack)
|
||||||
|
|
||||||
|
|
||||||
|
def _join_tangled_parts(parts: list[str]) -> str:
|
||||||
|
rendered = "\n".join(part.rstrip("\n") for part in parts if part is not None)
|
||||||
|
return rendered.rstrip() + "\n" if rendered else ""
|
||||||
|
|
||||||
|
|
||||||
|
def _safe_output_path(root: Path, relative_path: str) -> Path:
|
||||||
|
path = Path(relative_path)
|
||||||
|
if path.is_absolute():
|
||||||
|
raise ValueError(f"Tangle target must be relative: {relative_path}")
|
||||||
|
resolved = (root / path).resolve()
|
||||||
|
try:
|
||||||
|
resolved.relative_to(root.resolve())
|
||||||
|
except ValueError as exc:
|
||||||
|
raise ValueError(f"Tangle target escapes output directory: {relative_path}") from exc
|
||||||
|
return resolved
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||||
|
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
|
||||||
|
if not match:
|
||||||
|
return {"language": info.strip()} if info.strip() else {}
|
||||||
|
attrs = _parse_attrs(match.group("attrs") or "")
|
||||||
|
language = match.group("language")
|
||||||
|
if language:
|
||||||
|
attrs["language"] = language
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||||
|
attrs: dict[str, str] = {}
|
||||||
|
for part in shlex.split(raw):
|
||||||
|
if part.startswith("#") and len(part) > 1:
|
||||||
|
attrs["id"] = part[1:]
|
||||||
|
continue
|
||||||
|
if "=" not in part:
|
||||||
|
attrs[part] = "true"
|
||||||
|
continue
|
||||||
|
key, value = part.split("=", 1)
|
||||||
|
attrs[key.strip()] = value.strip()
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _chunk_references(content: str) -> list[str]:
|
||||||
|
return [match.group("id") for match in _CHUNK_REF_RE.finditer(content)]
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||||
|
count = used_ids.get(unit_id, 0) + 1
|
||||||
|
used_ids[unit_id] = count
|
||||||
|
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||||
|
|
||||||
|
|
||||||
|
def _slug(value: str) -> str:
|
||||||
|
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||||
|
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||||
|
return slug or "chunk"
|
||||||
|
|
||||||
|
|
||||||
|
def _hash_text(text: str) -> str:
|
||||||
|
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||||
@@ -4,6 +4,7 @@ from markitect_tool.ops.engine import (
|
|||||||
ComposeResult,
|
ComposeResult,
|
||||||
IncludeError,
|
IncludeError,
|
||||||
IncludeResult,
|
IncludeResult,
|
||||||
|
OperationProvenance,
|
||||||
TransformResult,
|
TransformResult,
|
||||||
compose_files,
|
compose_files,
|
||||||
resolve_includes,
|
resolve_includes,
|
||||||
@@ -14,6 +15,7 @@ __all__ = [
|
|||||||
"ComposeResult",
|
"ComposeResult",
|
||||||
"IncludeError",
|
"IncludeError",
|
||||||
"IncludeResult",
|
"IncludeResult",
|
||||||
|
"OperationProvenance",
|
||||||
"TransformResult",
|
"TransformResult",
|
||||||
"compose_files",
|
"compose_files",
|
||||||
"resolve_includes",
|
"resolve_includes",
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ from pathlib import Path
|
|||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
import yaml
|
import yaml
|
||||||
|
from markdown_it import MarkdownIt
|
||||||
|
|
||||||
from markitect_tool.core import parse_markdown
|
from markitect_tool.core import parse_markdown
|
||||||
from markitect_tool.query import extract_document
|
from markitect_tool.query import extract_document
|
||||||
@@ -18,15 +19,46 @@ class IncludeError(ValueError):
|
|||||||
"""Raised when include resolution cannot continue."""
|
"""Raised when include resolution cannot continue."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class OperationProvenance:
|
||||||
|
"""Structured provenance for deterministic Markdown operations."""
|
||||||
|
|
||||||
|
operation: str
|
||||||
|
source_path: str | None = None
|
||||||
|
line_start: int | None = None
|
||||||
|
line_end: int | None = None
|
||||||
|
target_path: str | None = None
|
||||||
|
dependencies: list[str] = field(default_factory=list)
|
||||||
|
metadata: dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"operation": self.operation,
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"line_start": self.line_start,
|
||||||
|
"line_end": self.line_end,
|
||||||
|
"target_path": self.target_path,
|
||||||
|
"dependencies": self.dependencies or None,
|
||||||
|
"metadata": self.metadata or None,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class TransformResult:
|
class TransformResult:
|
||||||
"""Result of a deterministic Markdown transform."""
|
"""Result of a deterministic Markdown transform."""
|
||||||
|
|
||||||
markdown: str
|
markdown: str
|
||||||
operations: list[str] = field(default_factory=list)
|
operations: list[str] = field(default_factory=list)
|
||||||
|
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||||
|
|
||||||
def to_dict(self) -> dict[str, Any]:
|
def to_dict(self) -> dict[str, Any]:
|
||||||
return asdict(self)
|
data: dict[str, Any] = {
|
||||||
|
"markdown": self.markdown,
|
||||||
|
"operations": self.operations,
|
||||||
|
"provenance": [event.to_dict() for event in self.provenance],
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value}
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
@@ -46,9 +78,15 @@ class IncludeResult:
|
|||||||
|
|
||||||
markdown: str
|
markdown: str
|
||||||
included_paths: list[str] = field(default_factory=list)
|
included_paths: list[str] = field(default_factory=list)
|
||||||
|
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||||
|
|
||||||
def to_dict(self) -> dict[str, Any]:
|
def to_dict(self) -> dict[str, Any]:
|
||||||
return asdict(self)
|
data: dict[str, Any] = {
|
||||||
|
"markdown": self.markdown,
|
||||||
|
"included_paths": self.included_paths,
|
||||||
|
"provenance": [event.to_dict() for event in self.provenance],
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value}
|
||||||
|
|
||||||
|
|
||||||
_COMMENT_INCLUDE_RE = re.compile(r"<!--\s*mkt:include\s+(?P<attrs>.*?)\s*-->", re.DOTALL)
|
_COMMENT_INCLUDE_RE = re.compile(r"<!--\s*mkt:include\s+(?P<attrs>.*?)\s*-->", re.DOTALL)
|
||||||
@@ -68,15 +106,30 @@ def transform_markdown(
|
|||||||
"""Apply deterministic operations to one Markdown document."""
|
"""Apply deterministic operations to one Markdown document."""
|
||||||
|
|
||||||
operations: list[str] = []
|
operations: list[str] = []
|
||||||
|
provenance: list[OperationProvenance] = []
|
||||||
frontmatter, body = _split_frontmatter(markdown)
|
frontmatter, body = _split_frontmatter(markdown)
|
||||||
|
|
||||||
if set_frontmatter:
|
if set_frontmatter:
|
||||||
frontmatter = _deep_merge(frontmatter, set_frontmatter)
|
frontmatter = _deep_merge(frontmatter, set_frontmatter)
|
||||||
operations.append("set_frontmatter")
|
operations.append("set_frontmatter")
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="set_frontmatter",
|
||||||
|
source_path=source_path,
|
||||||
|
metadata={"keys": sorted(set_frontmatter.keys())},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if heading_delta:
|
if heading_delta:
|
||||||
body = shift_heading_levels(body, heading_delta)
|
body, affected_lines = _shift_heading_levels(body, heading_delta)
|
||||||
operations.append(f"shift_headings:{heading_delta}")
|
operations.append(f"shift_headings:{heading_delta}")
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="shift_headings",
|
||||||
|
source_path=source_path,
|
||||||
|
metadata={"delta": heading_delta, "affected_lines": affected_lines},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if extract_selector:
|
if extract_selector:
|
||||||
document_text = _join_frontmatter(frontmatter, body) if frontmatter else body
|
document_text = _join_frontmatter(frontmatter, body) if frontmatter else body
|
||||||
@@ -84,24 +137,71 @@ def transform_markdown(
|
|||||||
body = "\n\n".join(extract_document(document, extract_selector))
|
body = "\n\n".join(extract_document(document, extract_selector))
|
||||||
frontmatter = {}
|
frontmatter = {}
|
||||||
operations.append(f"extract:{extract_selector}")
|
operations.append(f"extract:{extract_selector}")
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="extract",
|
||||||
|
source_path=source_path,
|
||||||
|
metadata={"selector": extract_selector},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if strip_frontmatter:
|
if strip_frontmatter:
|
||||||
frontmatter = {}
|
frontmatter = {}
|
||||||
operations.append("strip_frontmatter")
|
operations.append("strip_frontmatter")
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="strip_frontmatter",
|
||||||
|
source_path=source_path,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
return TransformResult(markdown=_join_frontmatter(frontmatter, body), operations=operations)
|
return TransformResult(
|
||||||
|
markdown=_join_frontmatter(frontmatter, body),
|
||||||
|
operations=operations,
|
||||||
|
provenance=provenance,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def shift_heading_levels(markdown: str, delta: int) -> str:
|
def shift_heading_levels(markdown: str, delta: int) -> str:
|
||||||
"""Shift ATX heading levels by delta while clamping to levels 1 through 6."""
|
"""Shift ATX heading levels by delta while clamping to levels 1 through 6."""
|
||||||
|
|
||||||
def replace(match: re.Match[str]) -> str:
|
shifted, _affected_lines = _shift_heading_levels(markdown, delta)
|
||||||
|
return shifted
|
||||||
|
|
||||||
|
|
||||||
|
def _shift_heading_levels(markdown: str, delta: int) -> tuple[str, list[int]]:
|
||||||
|
ignored_lines = _code_line_numbers(markdown)
|
||||||
|
affected_lines: list[int] = []
|
||||||
|
rendered_lines: list[str] = []
|
||||||
|
|
||||||
|
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
|
||||||
|
if line_number in ignored_lines:
|
||||||
|
rendered_lines.append(line)
|
||||||
|
continue
|
||||||
|
line_body = line.rstrip("\r\n")
|
||||||
|
line_ending = line[len(line_body) :]
|
||||||
|
match = _HEADING_RE.match(line_body)
|
||||||
|
if not match:
|
||||||
|
rendered_lines.append(line)
|
||||||
|
continue
|
||||||
marks = match.group(1)
|
marks = match.group(1)
|
||||||
suffix = match.group(2)
|
suffix = match.group(2)
|
||||||
level = min(max(len(marks) + delta, 1), 6)
|
level = min(max(len(marks) + delta, 1), 6)
|
||||||
return f"{'#' * level}{suffix}"
|
rendered_lines.append(f"{'#' * level}{suffix}{line_ending}")
|
||||||
|
affected_lines.append(line_number)
|
||||||
|
|
||||||
return _HEADING_RE.sub(replace, markdown)
|
return "".join(rendered_lines), affected_lines
|
||||||
|
|
||||||
|
|
||||||
|
def _code_line_numbers(markdown: str) -> set[int]:
|
||||||
|
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||||
|
ignored_lines: set[int] = set()
|
||||||
|
for token in parser.parse(markdown):
|
||||||
|
if token.type not in {"fence", "code_block"} or not token.map:
|
||||||
|
continue
|
||||||
|
start, end = token.map
|
||||||
|
ignored_lines.update(range(start + 1, end + 1))
|
||||||
|
return ignored_lines
|
||||||
|
|
||||||
|
|
||||||
def compose_files(
|
def compose_files(
|
||||||
@@ -154,18 +254,22 @@ def resolve_includes(
|
|||||||
root = Path(base_dir).resolve()
|
root = Path(base_dir).resolve()
|
||||||
stack = [Path(current_path).resolve()] if current_path else []
|
stack = [Path(current_path).resolve()] if current_path else []
|
||||||
included: list[Path] = []
|
included: list[Path] = []
|
||||||
|
provenance: list[OperationProvenance] = []
|
||||||
resolved = _resolve_include_text(
|
resolved = _resolve_include_text(
|
||||||
markdown,
|
markdown,
|
||||||
root=root,
|
root=root,
|
||||||
current_dir=Path(current_path).resolve().parent if current_path else root,
|
current_dir=Path(current_path).resolve().parent if current_path else root,
|
||||||
|
source_path=Path(current_path).resolve() if current_path else None,
|
||||||
stack=stack,
|
stack=stack,
|
||||||
included=included,
|
included=included,
|
||||||
|
provenance=provenance,
|
||||||
depth=0,
|
depth=0,
|
||||||
max_depth=max_depth,
|
max_depth=max_depth,
|
||||||
)
|
)
|
||||||
return IncludeResult(
|
return IncludeResult(
|
||||||
markdown=resolved,
|
markdown=resolved,
|
||||||
included_paths=[str(path) for path in included],
|
included_paths=[str(path) for path in included],
|
||||||
|
provenance=provenance,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -174,34 +278,73 @@ def _resolve_include_text(
|
|||||||
*,
|
*,
|
||||||
root: Path,
|
root: Path,
|
||||||
current_dir: Path,
|
current_dir: Path,
|
||||||
|
source_path: Path | None,
|
||||||
stack: list[Path],
|
stack: list[Path],
|
||||||
included: list[Path],
|
included: list[Path],
|
||||||
|
provenance: list[OperationProvenance],
|
||||||
depth: int,
|
depth: int,
|
||||||
max_depth: int,
|
max_depth: int,
|
||||||
) -> str:
|
) -> str:
|
||||||
if depth > max_depth:
|
if depth > max_depth:
|
||||||
raise IncludeError(f"Include depth exceeded max_depth={max_depth}")
|
raise IncludeError(f"Include depth exceeded max_depth={max_depth}")
|
||||||
|
|
||||||
def replace_comment(match: re.Match[str]) -> str:
|
ignored_lines = _code_line_numbers(markdown)
|
||||||
attrs = _parse_include_attrs(match.group("attrs"))
|
rendered_lines: list[str] = []
|
||||||
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
|
|
||||||
|
|
||||||
def replace_brace(match: re.Match[str]) -> str:
|
for line_number, line in enumerate(markdown.splitlines(keepends=True), start=1):
|
||||||
attrs = {"path": match.group("path").strip()}
|
if line_number in ignored_lines:
|
||||||
return _render_include(attrs, root, current_dir, stack, included, depth, max_depth)
|
rendered_lines.append(line)
|
||||||
|
continue
|
||||||
|
|
||||||
markdown = _COMMENT_INCLUDE_RE.sub(replace_comment, markdown)
|
def replace_comment(match: re.Match[str]) -> str:
|
||||||
return _BRACE_INCLUDE_RE.sub(replace_brace, markdown)
|
attrs = _parse_include_attrs(match.group("attrs"))
|
||||||
|
return _render_include(
|
||||||
|
attrs,
|
||||||
|
root,
|
||||||
|
current_dir,
|
||||||
|
source_path,
|
||||||
|
stack,
|
||||||
|
included,
|
||||||
|
provenance,
|
||||||
|
depth,
|
||||||
|
max_depth,
|
||||||
|
marker_line=line_number,
|
||||||
|
)
|
||||||
|
|
||||||
|
def replace_brace(match: re.Match[str]) -> str:
|
||||||
|
attrs = {"path": match.group("path").strip()}
|
||||||
|
return _render_include(
|
||||||
|
attrs,
|
||||||
|
root,
|
||||||
|
current_dir,
|
||||||
|
source_path,
|
||||||
|
stack,
|
||||||
|
included,
|
||||||
|
provenance,
|
||||||
|
depth,
|
||||||
|
max_depth,
|
||||||
|
marker_line=line_number,
|
||||||
|
)
|
||||||
|
|
||||||
|
line = _COMMENT_INCLUDE_RE.sub(replace_comment, line)
|
||||||
|
line = _BRACE_INCLUDE_RE.sub(replace_brace, line)
|
||||||
|
rendered_lines.append(line)
|
||||||
|
|
||||||
|
return "".join(rendered_lines)
|
||||||
|
|
||||||
|
|
||||||
def _render_include(
|
def _render_include(
|
||||||
attrs: dict[str, str],
|
attrs: dict[str, str],
|
||||||
root: Path,
|
root: Path,
|
||||||
current_dir: Path,
|
current_dir: Path,
|
||||||
|
source_path: Path | None,
|
||||||
stack: list[Path],
|
stack: list[Path],
|
||||||
included: list[Path],
|
included: list[Path],
|
||||||
|
provenance: list[OperationProvenance],
|
||||||
depth: int,
|
depth: int,
|
||||||
max_depth: int,
|
max_depth: int,
|
||||||
|
*,
|
||||||
|
marker_line: int,
|
||||||
) -> str:
|
) -> str:
|
||||||
raw_path = attrs.get("path")
|
raw_path = attrs.get("path")
|
||||||
if not raw_path:
|
if not raw_path:
|
||||||
@@ -228,12 +371,33 @@ def _render_include(
|
|||||||
body = shift_heading_levels(body, heading_delta)
|
body = shift_heading_levels(body, heading_delta)
|
||||||
|
|
||||||
included.append(include_path)
|
included.append(include_path)
|
||||||
|
provenance.append(
|
||||||
|
OperationProvenance(
|
||||||
|
operation="include",
|
||||||
|
source_path=str(source_path) if source_path else None,
|
||||||
|
line_start=marker_line,
|
||||||
|
line_end=marker_line,
|
||||||
|
target_path=str(include_path),
|
||||||
|
dependencies=[str(include_path)],
|
||||||
|
metadata={
|
||||||
|
key: value
|
||||||
|
for key, value in {
|
||||||
|
"selector": selector,
|
||||||
|
"heading_delta": heading_delta if heading_delta else None,
|
||||||
|
"include_frontmatter": attrs.get("include_frontmatter"),
|
||||||
|
}.items()
|
||||||
|
if value is not None
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
return _resolve_include_text(
|
return _resolve_include_text(
|
||||||
body.strip(),
|
body.strip(),
|
||||||
root=root,
|
root=root,
|
||||||
current_dir=include_path.parent,
|
current_dir=include_path.parent,
|
||||||
|
source_path=include_path,
|
||||||
stack=stack + [include_path],
|
stack=stack + [include_path],
|
||||||
included=included,
|
included=included,
|
||||||
|
provenance=provenance,
|
||||||
depth=depth + 1,
|
depth=depth + 1,
|
||||||
max_depth=max_depth,
|
max_depth=max_depth,
|
||||||
)
|
)
|
||||||
|
|||||||
27
src/markitect_tool/processor/__init__.py
Normal file
27
src/markitect_tool/processor/__init__.py
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
"""Deterministic fenced-block processor registry."""
|
||||||
|
|
||||||
|
from markitect_tool.processor.engine import (
|
||||||
|
FencedProcessorBlock,
|
||||||
|
ProcessorContext,
|
||||||
|
ProcessorOutputFile,
|
||||||
|
ProcessorRegistry,
|
||||||
|
ProcessorRequest,
|
||||||
|
ProcessorResult,
|
||||||
|
ProcessorRun,
|
||||||
|
default_processor_registry,
|
||||||
|
discover_fenced_processors,
|
||||||
|
run_fenced_processors,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"FencedProcessorBlock",
|
||||||
|
"ProcessorContext",
|
||||||
|
"ProcessorOutputFile",
|
||||||
|
"ProcessorRegistry",
|
||||||
|
"ProcessorRequest",
|
||||||
|
"ProcessorResult",
|
||||||
|
"ProcessorRun",
|
||||||
|
"default_processor_registry",
|
||||||
|
"discover_fenced_processors",
|
||||||
|
"run_fenced_processors",
|
||||||
|
]
|
||||||
374
src/markitect_tool/processor/engine.py
Normal file
374
src/markitect_tool/processor/engine.py
Normal file
@@ -0,0 +1,374 @@
|
|||||||
|
"""Processor API for deterministic fenced-block workflows."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import re
|
||||||
|
import shlex
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Callable
|
||||||
|
|
||||||
|
from markdown_it import MarkdownIt
|
||||||
|
|
||||||
|
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||||
|
from markitect_tool.ops import OperationProvenance
|
||||||
|
from markitect_tool.reference import (
|
||||||
|
ReferenceContext,
|
||||||
|
ReferenceResolutionError,
|
||||||
|
resolve_reference,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
ProcessorCallable = Callable[["ProcessorRequest"], "ProcessorResult"]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class FencedProcessorBlock:
|
||||||
|
"""A fenced Markdown block that opted into processor handling."""
|
||||||
|
|
||||||
|
processor: str
|
||||||
|
content: str
|
||||||
|
unit_id: str
|
||||||
|
attrs: dict[str, str]
|
||||||
|
language: str | None = None
|
||||||
|
source_path: str | None = None
|
||||||
|
line_start: int | None = None
|
||||||
|
line_end: int | None = None
|
||||||
|
content_hash: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {key: value for key, value in asdict(self).items() if value not in (None, {}, "")}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ProcessorContext:
|
||||||
|
"""Execution context passed to deterministic processors."""
|
||||||
|
|
||||||
|
root: Path = Path(".")
|
||||||
|
current_path: Path | None = None
|
||||||
|
namespaces: dict[str, str] = field(default_factory=dict)
|
||||||
|
variables: dict[str, Any] = field(default_factory=dict)
|
||||||
|
policy: dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def reference_context(self) -> ReferenceContext:
|
||||||
|
return ReferenceContext(
|
||||||
|
root=self.root,
|
||||||
|
current_path=self.current_path,
|
||||||
|
namespaces=self.namespaces,
|
||||||
|
)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"root": str(self.root),
|
||||||
|
"current_path": str(self.current_path) if self.current_path else None,
|
||||||
|
"namespaces": self.namespaces,
|
||||||
|
"variables": self.variables,
|
||||||
|
"policy": self.policy,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value not in (None, {}, "")}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ProcessorRequest:
|
||||||
|
"""One processor invocation."""
|
||||||
|
|
||||||
|
block: FencedProcessorBlock
|
||||||
|
context: ProcessorContext
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ProcessorOutputFile:
|
||||||
|
"""A generated file requested by a processor."""
|
||||||
|
|
||||||
|
path: str
|
||||||
|
content: str
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ProcessorResult:
|
||||||
|
"""Deterministic processor result envelope."""
|
||||||
|
|
||||||
|
content: str | None = None
|
||||||
|
files: list[ProcessorOutputFile] = field(default_factory=list)
|
||||||
|
diagnostics: list[Diagnostic] = field(default_factory=list)
|
||||||
|
dependencies: list[str] = field(default_factory=list)
|
||||||
|
provenance: list[OperationProvenance] = field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def valid(self) -> bool:
|
||||||
|
return not any(diagnostic.severity == "error" for diagnostic in self.diagnostics)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"valid": self.valid,
|
||||||
|
"content": self.content,
|
||||||
|
"files": [file.to_dict() for file in self.files],
|
||||||
|
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||||
|
"dependencies": self.dependencies,
|
||||||
|
"provenance": [event.to_dict() for event in self.provenance],
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value not in (None, [], {})}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ProcessorRun:
|
||||||
|
"""Results from running all processor blocks in a document."""
|
||||||
|
|
||||||
|
source_path: str | None
|
||||||
|
blocks: list[FencedProcessorBlock]
|
||||||
|
results: list[ProcessorResult]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def valid(self) -> bool:
|
||||||
|
return all(result.valid for result in self.results)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"valid": self.valid,
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"count": len(self.results),
|
||||||
|
"blocks": [block.to_dict() for block in self.blocks],
|
||||||
|
"results": [result.to_dict() for result in self.results],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class ProcessorRegistry:
|
||||||
|
"""Explicit registry for deterministic fenced-block processors."""
|
||||||
|
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._processors: dict[str, ProcessorCallable] = {}
|
||||||
|
|
||||||
|
def register(self, name: str, processor: ProcessorCallable) -> None:
|
||||||
|
key = _slug(name)
|
||||||
|
if not key:
|
||||||
|
raise ValueError("Processor name cannot be empty")
|
||||||
|
self._processors[key] = processor
|
||||||
|
|
||||||
|
def names(self) -> list[str]:
|
||||||
|
return sorted(self._processors)
|
||||||
|
|
||||||
|
def run(self, request: ProcessorRequest) -> ProcessorResult:
|
||||||
|
processor = self._processors.get(_slug(request.block.processor))
|
||||||
|
if processor is None:
|
||||||
|
return ProcessorResult(
|
||||||
|
diagnostics=[
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="processor.unknown",
|
||||||
|
message=f"Unknown processor `{request.block.processor}`",
|
||||||
|
source=SourceLocation(
|
||||||
|
path=request.block.source_path,
|
||||||
|
line=request.block.line_start,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
return processor(request)
|
||||||
|
|
||||||
|
|
||||||
|
def default_processor_registry() -> ProcessorRegistry:
|
||||||
|
"""Create the default deterministic processor registry."""
|
||||||
|
|
||||||
|
registry = ProcessorRegistry()
|
||||||
|
registry.register("identity", _identity_processor)
|
||||||
|
registry.register("uppercase", _uppercase_processor)
|
||||||
|
registry.register("include", _include_processor)
|
||||||
|
return registry
|
||||||
|
|
||||||
|
|
||||||
|
def discover_fenced_processors(
|
||||||
|
markdown: str,
|
||||||
|
*,
|
||||||
|
source_path: str | Path | None = None,
|
||||||
|
) -> list[FencedProcessorBlock]:
|
||||||
|
"""Discover fenced blocks that explicitly opt into processor handling."""
|
||||||
|
|
||||||
|
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||||
|
blocks: list[FencedProcessorBlock] = []
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
for index, token in enumerate(parser.parse(markdown)):
|
||||||
|
if token.type != "fence":
|
||||||
|
continue
|
||||||
|
attrs = _parse_fence_info(token.info)
|
||||||
|
processor = _processor_name(attrs)
|
||||||
|
if not processor:
|
||||||
|
continue
|
||||||
|
unit_id = _dedupe_id(_slug(attrs.get("id") or f"{processor}-{index}"), used_ids)
|
||||||
|
line_start = token.map[0] + 1 if token.map else None
|
||||||
|
line_end = token.map[1] if token.map else None
|
||||||
|
blocks.append(
|
||||||
|
FencedProcessorBlock(
|
||||||
|
processor=processor,
|
||||||
|
content=token.content,
|
||||||
|
unit_id=unit_id,
|
||||||
|
attrs={
|
||||||
|
key: value
|
||||||
|
for key, value in attrs.items()
|
||||||
|
if key not in {"id", "language", "processor"}
|
||||||
|
},
|
||||||
|
language=attrs.get("language"),
|
||||||
|
source_path=str(source_path) if source_path else None,
|
||||||
|
line_start=line_start,
|
||||||
|
line_end=line_end,
|
||||||
|
content_hash=_hash_text(token.content),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return blocks
|
||||||
|
|
||||||
|
|
||||||
|
def run_fenced_processors(
|
||||||
|
markdown: str,
|
||||||
|
*,
|
||||||
|
context: ProcessorContext,
|
||||||
|
registry: ProcessorRegistry | None = None,
|
||||||
|
source_path: str | Path | None = None,
|
||||||
|
) -> ProcessorRun:
|
||||||
|
"""Run all processor-marked fenced blocks in document order."""
|
||||||
|
|
||||||
|
active_registry = registry or default_processor_registry()
|
||||||
|
blocks = discover_fenced_processors(markdown, source_path=source_path or context.current_path)
|
||||||
|
results = [
|
||||||
|
active_registry.run(ProcessorRequest(block=block, context=context))
|
||||||
|
for block in blocks
|
||||||
|
]
|
||||||
|
return ProcessorRun(
|
||||||
|
source_path=str(source_path or context.current_path) if source_path or context.current_path else None,
|
||||||
|
blocks=blocks,
|
||||||
|
results=results,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _identity_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||||
|
return ProcessorResult(
|
||||||
|
content=request.block.content,
|
||||||
|
provenance=[
|
||||||
|
OperationProvenance(
|
||||||
|
operation="processor.identity",
|
||||||
|
source_path=request.block.source_path,
|
||||||
|
line_start=request.block.line_start,
|
||||||
|
line_end=request.block.line_end,
|
||||||
|
metadata={"unit_id": request.block.unit_id},
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _uppercase_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||||
|
return ProcessorResult(
|
||||||
|
content=request.block.content.upper(),
|
||||||
|
provenance=[
|
||||||
|
OperationProvenance(
|
||||||
|
operation="processor.uppercase",
|
||||||
|
source_path=request.block.source_path,
|
||||||
|
line_start=request.block.line_start,
|
||||||
|
line_end=request.block.line_end,
|
||||||
|
metadata={"unit_id": request.block.unit_id},
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _include_processor(request: ProcessorRequest) -> ProcessorResult:
|
||||||
|
reference = request.block.attrs.get("ref")
|
||||||
|
if not reference:
|
||||||
|
return ProcessorResult(
|
||||||
|
diagnostics=[
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="processor.include.missing_ref",
|
||||||
|
message="Include processor requires a `ref` attribute",
|
||||||
|
source=SourceLocation(
|
||||||
|
path=request.block.source_path,
|
||||||
|
line=request.block.line_start,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
resolution = resolve_reference(reference, context=request.context.reference_context())
|
||||||
|
except ReferenceResolutionError as exc:
|
||||||
|
return ProcessorResult(
|
||||||
|
diagnostics=[
|
||||||
|
Diagnostic(
|
||||||
|
severity="error",
|
||||||
|
code="processor.include.reference_error",
|
||||||
|
message=str(exc),
|
||||||
|
source=SourceLocation(
|
||||||
|
path=request.block.source_path,
|
||||||
|
line=request.block.line_start,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
content = "\n\n".join(unit.text for unit in resolution.units)
|
||||||
|
return ProcessorResult(
|
||||||
|
content=content,
|
||||||
|
dependencies=[resolution.target_path],
|
||||||
|
provenance=[
|
||||||
|
OperationProvenance(
|
||||||
|
operation="processor.include",
|
||||||
|
source_path=request.block.source_path,
|
||||||
|
line_start=request.block.line_start,
|
||||||
|
line_end=request.block.line_end,
|
||||||
|
target_path=resolution.target_path,
|
||||||
|
dependencies=[resolution.target_path],
|
||||||
|
metadata={"ref": reference, "unit_ids": [unit.unit_id for unit in resolution.units]},
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _processor_name(attrs: dict[str, str]) -> str | None:
|
||||||
|
if "processor" in attrs:
|
||||||
|
return attrs["processor"]
|
||||||
|
language = attrs.get("language", "")
|
||||||
|
if language.startswith("mkt-"):
|
||||||
|
return language.removeprefix("mkt-")
|
||||||
|
if language == "mkt" and "type" in attrs:
|
||||||
|
return attrs["type"]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||||
|
match = re.match(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$", info.strip())
|
||||||
|
if not match:
|
||||||
|
return {"language": info.strip()} if info.strip() else {}
|
||||||
|
attrs = _parse_attrs(match.group("attrs") or "")
|
||||||
|
language = match.group("language")
|
||||||
|
if language:
|
||||||
|
attrs["language"] = language
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||||
|
attrs: dict[str, str] = {}
|
||||||
|
for part in shlex.split(raw):
|
||||||
|
if part.startswith("#") and len(part) > 1:
|
||||||
|
attrs["id"] = part[1:]
|
||||||
|
continue
|
||||||
|
if "=" not in part:
|
||||||
|
attrs[part] = "true"
|
||||||
|
continue
|
||||||
|
key, value = part.split("=", 1)
|
||||||
|
attrs[key.strip()] = value.strip()
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||||
|
count = used_ids.get(unit_id, 0) + 1
|
||||||
|
used_ids[unit_id] = count
|
||||||
|
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||||
|
|
||||||
|
|
||||||
|
def _slug(value: str) -> str:
|
||||||
|
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||||
|
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||||
|
return slug
|
||||||
|
|
||||||
|
|
||||||
|
def _hash_text(text: str) -> str:
|
||||||
|
return "sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||||
25
src/markitect_tool/reference/__init__.py
Normal file
25
src/markitect_tool/reference/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
"""Namespaced content reference resolution for Markdown artifacts."""
|
||||||
|
|
||||||
|
from markitect_tool.reference.engine import (
|
||||||
|
ContentUnit,
|
||||||
|
ReferenceAddress,
|
||||||
|
ReferenceContext,
|
||||||
|
ReferenceResolution,
|
||||||
|
ReferenceResolutionError,
|
||||||
|
SourceSpan,
|
||||||
|
load_namespaces,
|
||||||
|
parse_reference,
|
||||||
|
resolve_reference,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ContentUnit",
|
||||||
|
"ReferenceAddress",
|
||||||
|
"ReferenceContext",
|
||||||
|
"ReferenceResolution",
|
||||||
|
"ReferenceResolutionError",
|
||||||
|
"SourceSpan",
|
||||||
|
"load_namespaces",
|
||||||
|
"parse_reference",
|
||||||
|
"resolve_reference",
|
||||||
|
]
|
||||||
626
src/markitect_tool/reference/engine.py
Normal file
626
src/markitect_tool/reference/engine.py
Normal file
@@ -0,0 +1,626 @@
|
|||||||
|
"""Reference parsing and resolution for Markdown content units."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import re
|
||||||
|
import shlex
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from markdown_it import MarkdownIt
|
||||||
|
|
||||||
|
from markitect_tool.core import ContentBlock, Document, Heading, Section, parse_markdown
|
||||||
|
from markitect_tool.query import InvalidQueryError, QueryMatch, query_document
|
||||||
|
|
||||||
|
|
||||||
|
class ReferenceResolutionError(ValueError):
|
||||||
|
"""Raised when a content reference cannot be resolved."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ReferenceAddress:
|
||||||
|
"""Parsed content reference address.
|
||||||
|
|
||||||
|
Syntax is intentionally compact and Markdown-friendly:
|
||||||
|
|
||||||
|
- ``path/to/file.md``
|
||||||
|
- ``std:clauses/payment.md``
|
||||||
|
- ``std:clauses/payment.md#section:terms``
|
||||||
|
- ``std:clauses/payment.md::sections[heading=Terms]``
|
||||||
|
- ``#intro`` for a fragment in the current document
|
||||||
|
"""
|
||||||
|
|
||||||
|
raw: str
|
||||||
|
namespace: str | None = None
|
||||||
|
address: str = ""
|
||||||
|
fragment: str | None = None
|
||||||
|
selector: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
key: value
|
||||||
|
for key, value in asdict(self).items()
|
||||||
|
if value is not None and value != ""
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ReferenceContext:
|
||||||
|
"""Inputs used to resolve namespaced and relative content references."""
|
||||||
|
|
||||||
|
root: Path = Path(".")
|
||||||
|
current_path: Path | None = None
|
||||||
|
namespaces: dict[str, str] = field(default_factory=dict)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_document(
|
||||||
|
cls,
|
||||||
|
document: Document,
|
||||||
|
*,
|
||||||
|
root: str | Path = ".",
|
||||||
|
current_path: str | Path | None = None,
|
||||||
|
) -> "ReferenceContext":
|
||||||
|
"""Build a reference context from document frontmatter."""
|
||||||
|
|
||||||
|
source_path = current_path or document.source_path
|
||||||
|
return cls(
|
||||||
|
root=Path(root),
|
||||||
|
current_path=Path(source_path) if source_path else None,
|
||||||
|
namespaces=load_namespaces(document.frontmatter),
|
||||||
|
)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"root": str(self.root),
|
||||||
|
"current_path": str(self.current_path) if self.current_path else None,
|
||||||
|
"namespaces": self.namespaces,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SourceSpan:
|
||||||
|
"""Line span for a resolved unit in its source file."""
|
||||||
|
|
||||||
|
line_start: int | None = None
|
||||||
|
line_end: int | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {key: value for key, value in asdict(self).items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ContentUnit:
|
||||||
|
"""One addressable content unit resolved from Markdown."""
|
||||||
|
|
||||||
|
kind: str
|
||||||
|
unit_id: str
|
||||||
|
text: str
|
||||||
|
source_path: str
|
||||||
|
span: SourceSpan | None = None
|
||||||
|
name: str | None = None
|
||||||
|
content_hash: str = ""
|
||||||
|
metadata: dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"kind": self.kind,
|
||||||
|
"unit_id": self.unit_id,
|
||||||
|
"name": self.name,
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"span": self.span.to_dict() if self.span else None,
|
||||||
|
"content_hash": self.content_hash,
|
||||||
|
"metadata": self.metadata or None,
|
||||||
|
"text": self.text,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ReferenceResolution:
|
||||||
|
"""Resolved content reference and its dependency edge."""
|
||||||
|
|
||||||
|
reference: ReferenceAddress
|
||||||
|
source_path: str
|
||||||
|
target_path: str
|
||||||
|
units: list[ContentUnit]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"reference": self.reference.to_dict(),
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"target_path": self.target_path,
|
||||||
|
"count": len(self.units),
|
||||||
|
"units": [unit.to_dict() for unit in self.units],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
_NAMESPACE_RE = re.compile(r"^(?P<namespace>[A-Za-z][A-Za-z0-9_.-]*):(?P<address>.*)$")
|
||||||
|
_HEADING_ID_RE = re.compile(r"^(?P<title>.*?)(?:\s+\{#(?P<id>[A-Za-z0-9_.:-]+)\})?$")
|
||||||
|
_REGION_OPEN_RE = re.compile(r"<!--\s*mkt:region\s+(?P<attrs>.*?)\s*-->")
|
||||||
|
_REGION_CLOSE_RE = re.compile(r"<!--\s*/mkt:region\s*-->")
|
||||||
|
_FENCE_ATTRS_RE = re.compile(r"^(?P<language>[^\s{]+)?(?:\s+\{(?P<attrs>.*)\})?\s*$")
|
||||||
|
|
||||||
|
|
||||||
|
def parse_reference(reference: str) -> ReferenceAddress:
|
||||||
|
"""Parse a compact Markitect content reference."""
|
||||||
|
|
||||||
|
raw = reference.strip()
|
||||||
|
if not raw:
|
||||||
|
raise ReferenceResolutionError("Reference cannot be empty")
|
||||||
|
|
||||||
|
selector: str | None = None
|
||||||
|
base = raw
|
||||||
|
if "::" in base:
|
||||||
|
base, selector = base.split("::", 1)
|
||||||
|
selector = selector.strip()
|
||||||
|
if not selector:
|
||||||
|
raise ReferenceResolutionError(f"Reference selector is empty in `{reference}`")
|
||||||
|
|
||||||
|
fragment: str | None = None
|
||||||
|
if "#" in base:
|
||||||
|
base, fragment = base.split("#", 1)
|
||||||
|
fragment = fragment.strip()
|
||||||
|
if not fragment:
|
||||||
|
raise ReferenceResolutionError(f"Reference fragment is empty in `{reference}`")
|
||||||
|
|
||||||
|
namespace: str | None = None
|
||||||
|
address = base.strip()
|
||||||
|
match = _NAMESPACE_RE.match(address)
|
||||||
|
if match and "/" not in match.group("namespace") and "\\" not in match.group("namespace"):
|
||||||
|
namespace = match.group("namespace")
|
||||||
|
address = match.group("address").strip()
|
||||||
|
|
||||||
|
return ReferenceAddress(
|
||||||
|
raw=raw,
|
||||||
|
namespace=namespace,
|
||||||
|
address=address,
|
||||||
|
fragment=fragment,
|
||||||
|
selector=selector,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def load_namespaces(frontmatter: dict[str, Any]) -> dict[str, str]:
|
||||||
|
"""Load namespace mappings from Markdown frontmatter."""
|
||||||
|
|
||||||
|
raw_namespaces = frontmatter.get("namespaces", {})
|
||||||
|
if raw_namespaces is None:
|
||||||
|
return {}
|
||||||
|
if not isinstance(raw_namespaces, dict):
|
||||||
|
raise ReferenceResolutionError("Frontmatter `namespaces` must be a mapping")
|
||||||
|
|
||||||
|
namespaces: dict[str, str] = {}
|
||||||
|
for raw_key, raw_value in raw_namespaces.items():
|
||||||
|
key = str(raw_key).strip().rstrip(":")
|
||||||
|
if not key:
|
||||||
|
raise ReferenceResolutionError("Namespace keys cannot be empty")
|
||||||
|
if not _NAMESPACE_RE.match(f"{key}:"):
|
||||||
|
raise ReferenceResolutionError(f"Invalid namespace key `{raw_key}`")
|
||||||
|
if not isinstance(raw_value, str):
|
||||||
|
raise ReferenceResolutionError(f"Namespace `{key}` must map to a string path")
|
||||||
|
value = raw_value.strip()
|
||||||
|
if not value:
|
||||||
|
raise ReferenceResolutionError(f"Namespace `{key}` cannot map to an empty path")
|
||||||
|
namespaces[key] = value
|
||||||
|
return namespaces
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_reference(
|
||||||
|
reference: str | ReferenceAddress,
|
||||||
|
*,
|
||||||
|
context: ReferenceContext,
|
||||||
|
) -> ReferenceResolution:
|
||||||
|
"""Resolve a content reference to one or more content units."""
|
||||||
|
|
||||||
|
address = parse_reference(reference) if isinstance(reference, str) else reference
|
||||||
|
root = context.root.resolve()
|
||||||
|
source_path = context.current_path.resolve() if context.current_path else root
|
||||||
|
target_path = _resolve_target_path(address, context, root, source_path)
|
||||||
|
if not target_path.exists() or not target_path.is_file():
|
||||||
|
raise ReferenceResolutionError(f"Referenced file not found: {target_path}")
|
||||||
|
|
||||||
|
markdown = target_path.read_text(encoding="utf-8")
|
||||||
|
document = parse_markdown(markdown, source_path=str(target_path))
|
||||||
|
|
||||||
|
if address.selector and address.fragment:
|
||||||
|
raise ReferenceResolutionError("Reference cannot use both fragment and selector")
|
||||||
|
if address.selector:
|
||||||
|
units = _units_from_selector(document, address.selector, target_path)
|
||||||
|
elif address.fragment:
|
||||||
|
units = _units_from_fragment(document, address.fragment, target_path, markdown)
|
||||||
|
else:
|
||||||
|
units = [_document_unit(document, target_path, markdown)]
|
||||||
|
|
||||||
|
if not units:
|
||||||
|
raise ReferenceResolutionError(f"Reference `{address.raw}` did not match any content units")
|
||||||
|
|
||||||
|
return ReferenceResolution(
|
||||||
|
reference=address,
|
||||||
|
source_path=str(source_path),
|
||||||
|
target_path=str(target_path),
|
||||||
|
units=units,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_target_path(
|
||||||
|
address: ReferenceAddress,
|
||||||
|
context: ReferenceContext,
|
||||||
|
root: Path,
|
||||||
|
source_path: Path,
|
||||||
|
) -> Path:
|
||||||
|
if address.namespace:
|
||||||
|
if address.namespace not in context.namespaces:
|
||||||
|
raise ReferenceResolutionError(f"Unknown namespace `{address.namespace}`")
|
||||||
|
namespace_target = _path_from_namespace(context.namespaces[address.namespace], root)
|
||||||
|
candidate = namespace_target / address.address if namespace_target.is_dir() else namespace_target
|
||||||
|
elif address.address:
|
||||||
|
base_dir = source_path.parent if source_path.is_file() else root
|
||||||
|
candidate = Path(address.address)
|
||||||
|
candidate = candidate if candidate.is_absolute() else base_dir / candidate
|
||||||
|
elif context.current_path:
|
||||||
|
candidate = context.current_path
|
||||||
|
else:
|
||||||
|
raise ReferenceResolutionError("Pathless references require a current document")
|
||||||
|
|
||||||
|
resolved = candidate.resolve()
|
||||||
|
try:
|
||||||
|
resolved.relative_to(root)
|
||||||
|
except ValueError as exc:
|
||||||
|
raise ReferenceResolutionError(f"Reference escapes root: {address.raw}") from exc
|
||||||
|
return resolved
|
||||||
|
|
||||||
|
|
||||||
|
def _path_from_namespace(raw_path: str, root: Path) -> Path:
|
||||||
|
path = Path(raw_path)
|
||||||
|
if not path.is_absolute():
|
||||||
|
path = root / path
|
||||||
|
return path.resolve()
|
||||||
|
|
||||||
|
|
||||||
|
def _units_from_selector(
|
||||||
|
document: Document,
|
||||||
|
selector: str,
|
||||||
|
target_path: Path,
|
||||||
|
) -> list[ContentUnit]:
|
||||||
|
try:
|
||||||
|
matches = query_document(document, selector)
|
||||||
|
except InvalidQueryError as exc:
|
||||||
|
raise ReferenceResolutionError(str(exc)) from exc
|
||||||
|
return [_unit_from_query_match(match, target_path) for match in matches]
|
||||||
|
|
||||||
|
|
||||||
|
def _units_from_fragment(
|
||||||
|
document: Document,
|
||||||
|
fragment: str,
|
||||||
|
target_path: Path,
|
||||||
|
markdown: str,
|
||||||
|
) -> list[ContentUnit]:
|
||||||
|
kind, _, value = fragment.partition(":")
|
||||||
|
if not value:
|
||||||
|
kind, value = "id", kind
|
||||||
|
lookup = _slug(value)
|
||||||
|
|
||||||
|
if kind == "document":
|
||||||
|
return [_document_unit(document, target_path, markdown)]
|
||||||
|
if kind == "id":
|
||||||
|
for units in [
|
||||||
|
_section_units(document, target_path),
|
||||||
|
_region_units(markdown, target_path),
|
||||||
|
_fenced_block_units(markdown, target_path),
|
||||||
|
_heading_units(document, target_path),
|
||||||
|
]:
|
||||||
|
matches = [
|
||||||
|
unit for unit in units if unit.unit_id == lookup or _slug(unit.name or "") == lookup
|
||||||
|
]
|
||||||
|
if matches:
|
||||||
|
return matches
|
||||||
|
return []
|
||||||
|
if kind in {"id", "section"}:
|
||||||
|
sections = _section_units(document, target_path)
|
||||||
|
return [unit for unit in sections if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
|
||||||
|
if kind == "heading":
|
||||||
|
headings = _heading_units(document, target_path)
|
||||||
|
return [unit for unit in headings if unit.unit_id == lookup or _slug(unit.name or "") == lookup]
|
||||||
|
if kind == "block":
|
||||||
|
return _block_fragment_units(document, target_path, value)
|
||||||
|
if kind == "region":
|
||||||
|
return [unit for unit in _region_units(markdown, target_path) if unit.unit_id == lookup]
|
||||||
|
if kind == "fence":
|
||||||
|
return [unit for unit in _fenced_block_units(markdown, target_path) if unit.unit_id == lookup]
|
||||||
|
if kind == "tag":
|
||||||
|
return [
|
||||||
|
unit
|
||||||
|
for unit in _region_units(markdown, target_path) + _fenced_block_units(markdown, target_path)
|
||||||
|
if lookup in {_slug(tag) for tag in unit.metadata.get("tags", [])}
|
||||||
|
]
|
||||||
|
if kind == "line":
|
||||||
|
return _line_range_units(markdown, target_path, value)
|
||||||
|
raise ReferenceResolutionError(f"Unsupported reference fragment kind `{kind}`")
|
||||||
|
|
||||||
|
|
||||||
|
def _document_unit(document: Document, target_path: Path, markdown: str) -> ContentUnit:
|
||||||
|
unit_id = _slug(str(document.frontmatter.get("id") or target_path.stem))
|
||||||
|
return _content_unit(
|
||||||
|
kind="document",
|
||||||
|
unit_id=unit_id,
|
||||||
|
text=markdown,
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(1, len(markdown.splitlines())),
|
||||||
|
name=str(document.frontmatter.get("title") or target_path.stem),
|
||||||
|
metadata={"frontmatter": document.frontmatter},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _unit_from_query_match(match: QueryMatch, target_path: Path) -> ContentUnit:
|
||||||
|
unit_id = _slug(match.path.replace("$.", "").replace("[", "-").replace("]", ""))
|
||||||
|
name = match.text.splitlines()[0].lstrip("# ").strip() if match.text else match.kind
|
||||||
|
return _content_unit(
|
||||||
|
kind=match.kind,
|
||||||
|
unit_id=unit_id,
|
||||||
|
text=match.text if match.text is not None else str(match.value),
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(match.line, None),
|
||||||
|
name=name,
|
||||||
|
metadata={"query_path": match.path, "value": match.value},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _section_units(document: Document, target_path: Path) -> list[ContentUnit]:
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
return [
|
||||||
|
_section_unit(section, target_path, used_ids)
|
||||||
|
for section in document.sections
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _section_unit(
|
||||||
|
section: Section,
|
||||||
|
target_path: Path,
|
||||||
|
used_ids: dict[str, int],
|
||||||
|
) -> ContentUnit:
|
||||||
|
title, explicit_id = _heading_title_and_id(section.heading)
|
||||||
|
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
|
||||||
|
line_end = section.blocks[-1].line_end if section.blocks else section.heading.line
|
||||||
|
lines = [f"{'#' * section.heading.level} {section.heading.text}"]
|
||||||
|
for block in section.blocks:
|
||||||
|
if block.text:
|
||||||
|
lines.extend(["", block.text])
|
||||||
|
return _content_unit(
|
||||||
|
kind="section",
|
||||||
|
unit_id=unit_id,
|
||||||
|
text="\n".join(lines).strip(),
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(section.heading.line, line_end),
|
||||||
|
name=title,
|
||||||
|
metadata={"heading_level": section.heading.level},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _heading_units(document: Document, target_path: Path) -> list[ContentUnit]:
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
units: list[ContentUnit] = []
|
||||||
|
for heading in document.headings:
|
||||||
|
title, explicit_id = _heading_title_and_id(heading)
|
||||||
|
unit_id = _dedupe_id(_slug(explicit_id or title), used_ids)
|
||||||
|
units.append(
|
||||||
|
_content_unit(
|
||||||
|
kind="heading",
|
||||||
|
unit_id=unit_id,
|
||||||
|
text=f"{'#' * heading.level} {heading.text}",
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(heading.line, heading.line),
|
||||||
|
name=title,
|
||||||
|
metadata={"heading_level": heading.level},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return units
|
||||||
|
|
||||||
|
|
||||||
|
def _block_fragment_units(
|
||||||
|
document: Document,
|
||||||
|
target_path: Path,
|
||||||
|
value: str,
|
||||||
|
) -> list[ContentUnit]:
|
||||||
|
blocks = _block_units(document.blocks, target_path)
|
||||||
|
if value.isdigit():
|
||||||
|
index = int(value)
|
||||||
|
return [blocks[index]] if 0 <= index < len(blocks) else []
|
||||||
|
lookup = _slug(value)
|
||||||
|
return [unit for unit in blocks if unit.unit_id == lookup]
|
||||||
|
|
||||||
|
|
||||||
|
def _block_units(blocks: list[ContentBlock], target_path: Path) -> list[ContentUnit]:
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
units: list[ContentUnit] = []
|
||||||
|
for index, block in enumerate(blocks):
|
||||||
|
base_id = f"{block.type}-{block.line_start or index}"
|
||||||
|
units.append(
|
||||||
|
_content_unit(
|
||||||
|
kind=block.type,
|
||||||
|
unit_id=_dedupe_id(_slug(base_id), used_ids),
|
||||||
|
text=block.text,
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(block.line_start, block.line_end),
|
||||||
|
name=block.type,
|
||||||
|
metadata={"block_index": index},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return units
|
||||||
|
|
||||||
|
|
||||||
|
def _region_units(markdown: str, target_path: Path) -> list[ContentUnit]:
|
||||||
|
lines = markdown.splitlines()
|
||||||
|
units: list[ContentUnit] = []
|
||||||
|
open_region: tuple[int, str, list[str]] | None = None
|
||||||
|
|
||||||
|
for index, line in enumerate(lines, start=1):
|
||||||
|
open_match = _REGION_OPEN_RE.search(line)
|
||||||
|
close_match = _REGION_CLOSE_RE.search(line)
|
||||||
|
if open_match and open_region is not None:
|
||||||
|
raise ReferenceResolutionError("Nested mkt:region blocks are not supported")
|
||||||
|
if close_match:
|
||||||
|
if open_region is None:
|
||||||
|
raise ReferenceResolutionError("Region close marker has no matching open marker")
|
||||||
|
start_line, region_id, tags = open_region
|
||||||
|
content_lines = lines[start_line:index - 1]
|
||||||
|
units.append(
|
||||||
|
_content_unit(
|
||||||
|
kind="region",
|
||||||
|
unit_id=_slug(region_id),
|
||||||
|
text="\n".join(content_lines).strip(),
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(start_line, index),
|
||||||
|
name=region_id,
|
||||||
|
metadata={"tags": tags},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
open_region = None
|
||||||
|
continue
|
||||||
|
if open_match:
|
||||||
|
attrs = _parse_attrs(open_match.group("attrs"))
|
||||||
|
region_id = attrs.get("id")
|
||||||
|
if not region_id:
|
||||||
|
raise ReferenceResolutionError("Region marker requires an id attribute")
|
||||||
|
open_region = (index, region_id, _tags_from_attrs(attrs))
|
||||||
|
|
||||||
|
if open_region is not None:
|
||||||
|
raise ReferenceResolutionError("Region open marker has no matching close marker")
|
||||||
|
return units
|
||||||
|
|
||||||
|
|
||||||
|
def _fenced_block_units(markdown: str, target_path: Path) -> list[ContentUnit]:
|
||||||
|
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||||
|
units: list[ContentUnit] = []
|
||||||
|
used_ids: dict[str, int] = {}
|
||||||
|
for index, token in enumerate(parser.parse(markdown)):
|
||||||
|
if token.type != "fence":
|
||||||
|
continue
|
||||||
|
attrs = _parse_fence_info(token.info)
|
||||||
|
unit_id = attrs.get("id")
|
||||||
|
if not unit_id:
|
||||||
|
continue
|
||||||
|
line_start = token.map[0] + 1 if token.map else None
|
||||||
|
line_end = token.map[1] if token.map else None
|
||||||
|
units.append(
|
||||||
|
_content_unit(
|
||||||
|
kind="fenced_block",
|
||||||
|
unit_id=_dedupe_id(_slug(unit_id), used_ids),
|
||||||
|
text=token.content,
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(line_start, line_end),
|
||||||
|
name=unit_id,
|
||||||
|
metadata={
|
||||||
|
"language": attrs.get("language"),
|
||||||
|
"tags": _tags_from_attrs(attrs),
|
||||||
|
"attrs": {
|
||||||
|
key: value
|
||||||
|
for key, value in attrs.items()
|
||||||
|
if key not in {"id", "language", "tag", "tags"}
|
||||||
|
},
|
||||||
|
"block_index": index,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return units
|
||||||
|
|
||||||
|
|
||||||
|
def _line_range_units(markdown: str, target_path: Path, value: str) -> list[ContentUnit]:
|
||||||
|
match = re.match(r"^(?P<start>\d+)(?:-(?P<end>\d+))?$", value)
|
||||||
|
if not match:
|
||||||
|
raise ReferenceResolutionError("Line fragments must use `line:start` or `line:start-end`")
|
||||||
|
start = int(match.group("start"))
|
||||||
|
end = int(match.group("end") or start)
|
||||||
|
lines = markdown.splitlines()
|
||||||
|
if start < 1 or end < start or end > len(lines):
|
||||||
|
return []
|
||||||
|
text = "\n".join(lines[start - 1:end])
|
||||||
|
return [
|
||||||
|
_content_unit(
|
||||||
|
kind="line_range",
|
||||||
|
unit_id=f"line-{start}-{end}",
|
||||||
|
text=text,
|
||||||
|
source_path=target_path,
|
||||||
|
span=SourceSpan(start, end),
|
||||||
|
name=f"lines {start}-{end}",
|
||||||
|
metadata={},
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_fence_info(info: str) -> dict[str, str]:
|
||||||
|
match = _FENCE_ATTRS_RE.match(info.strip())
|
||||||
|
if not match:
|
||||||
|
return {"language": info.strip()} if info.strip() else {}
|
||||||
|
attrs = _parse_attrs(match.group("attrs") or "")
|
||||||
|
language = match.group("language")
|
||||||
|
if language:
|
||||||
|
attrs["language"] = language
|
||||||
|
if "id" not in attrs and attrs:
|
||||||
|
for key in list(attrs):
|
||||||
|
if key.startswith("#"):
|
||||||
|
attrs["id"] = key[1:]
|
||||||
|
del attrs[key]
|
||||||
|
break
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_attrs(raw: str) -> dict[str, str]:
|
||||||
|
attrs: dict[str, str] = {}
|
||||||
|
for part in shlex.split(raw):
|
||||||
|
if part.startswith("#") and len(part) > 1:
|
||||||
|
attrs["id"] = part[1:]
|
||||||
|
continue
|
||||||
|
if "=" not in part:
|
||||||
|
attrs[part] = "true"
|
||||||
|
continue
|
||||||
|
key, value = part.split("=", 1)
|
||||||
|
attrs[key.strip()] = value.strip()
|
||||||
|
return attrs
|
||||||
|
|
||||||
|
|
||||||
|
def _tags_from_attrs(attrs: dict[str, str]) -> list[str]:
|
||||||
|
raw = attrs.get("tags") or attrs.get("tag") or ""
|
||||||
|
return [tag.strip() for tag in re.split(r"[, ]+", raw) if tag.strip()]
|
||||||
|
|
||||||
|
|
||||||
|
def _content_unit(
|
||||||
|
*,
|
||||||
|
kind: str,
|
||||||
|
unit_id: str,
|
||||||
|
text: str,
|
||||||
|
source_path: Path,
|
||||||
|
span: SourceSpan | None,
|
||||||
|
name: str | None,
|
||||||
|
metadata: dict[str, Any] | None = None,
|
||||||
|
) -> ContentUnit:
|
||||||
|
return ContentUnit(
|
||||||
|
kind=kind,
|
||||||
|
unit_id=unit_id,
|
||||||
|
text=text,
|
||||||
|
source_path=str(source_path),
|
||||||
|
span=span,
|
||||||
|
name=name,
|
||||||
|
content_hash="sha256:" + hashlib.sha256(text.encode("utf-8")).hexdigest(),
|
||||||
|
metadata=metadata or {},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _heading_title_and_id(heading: Heading) -> tuple[str, str | None]:
|
||||||
|
match = _HEADING_ID_RE.match(heading.text.strip())
|
||||||
|
if not match:
|
||||||
|
return heading.text.strip(), None
|
||||||
|
return match.group("title").strip(), match.group("id")
|
||||||
|
|
||||||
|
|
||||||
|
def _dedupe_id(unit_id: str, used_ids: dict[str, int]) -> str:
|
||||||
|
count = used_ids.get(unit_id, 0) + 1
|
||||||
|
used_ids[unit_id] = count
|
||||||
|
return unit_id if count == 1 else f"{unit_id}-{count}"
|
||||||
|
|
||||||
|
|
||||||
|
def _slug(value: str) -> str:
|
||||||
|
slug = re.sub(r"[^a-z0-9_.:-]+", "-", value.strip().lower())
|
||||||
|
slug = re.sub(r"-+", "-", slug).strip("-")
|
||||||
|
return slug or "unit"
|
||||||
106
tests/test_content_class_resolution.py
Normal file
106
tests/test_content_class_resolution.py
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.content_class import load_content_classes
|
||||||
|
|
||||||
|
|
||||||
|
def test_c3_linearization_for_diamond_inheritance():
|
||||||
|
registry = load_content_classes(
|
||||||
|
{
|
||||||
|
"classes": {
|
||||||
|
"base": {"slots": {"sections": ["Overview"]}},
|
||||||
|
"left": {"extends": ["base"], "slots": {"sections": ["Left"]}},
|
||||||
|
"right": {"extends": ["base"], "slots": {"sections": ["Right"]}},
|
||||||
|
"leaf": {"extends": ["left", "right"], "slots": {"title": "Leaf"}},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
assert registry.linearize("leaf") == ["leaf", "left", "right", "base"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_compose_merges_slots_with_explicit_policies():
|
||||||
|
registry = load_content_classes(
|
||||||
|
{
|
||||||
|
"classes": {
|
||||||
|
"base": {
|
||||||
|
"slots": {
|
||||||
|
"sections": ["Overview"],
|
||||||
|
"assertions": {"tone": "plain", "depth": "short"},
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"market": {
|
||||||
|
"extends": ["base"],
|
||||||
|
"slots": {
|
||||||
|
"sections": ["Pricing"],
|
||||||
|
"assertions": {"depth": "detailed"},
|
||||||
|
},
|
||||||
|
"merge_policies": {
|
||||||
|
"sections": "append",
|
||||||
|
"assertions": "deep_merge",
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"instance": {
|
||||||
|
"extends": ["market"],
|
||||||
|
"slots": {"sections": ["Risks"]},
|
||||||
|
"merge_policies": {"sections": "append"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
result = registry.compose("instance")
|
||||||
|
|
||||||
|
assert result.valid
|
||||||
|
assert result.slots["sections"] == ["Overview", "Pricing", "Risks"]
|
||||||
|
assert result.slots["assertions"] == {"tone": "plain", "depth": "detailed"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_compose_reports_error_on_conflict():
|
||||||
|
registry = load_content_classes(
|
||||||
|
{
|
||||||
|
"classes": {
|
||||||
|
"base": {"slots": {"owner": "A"}},
|
||||||
|
"instance": {
|
||||||
|
"extends": ["base"],
|
||||||
|
"slots": {"owner": "B"},
|
||||||
|
"merge_policies": {"owner": "error_on_conflict"},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
result = registry.compose("instance")
|
||||||
|
|
||||||
|
assert not result.valid
|
||||||
|
assert result.diagnostics[0].code == "content_class.merge_conflict"
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_class_resolve_outputs_text(tmp_path: Path):
|
||||||
|
class_file = tmp_path / "classes.yaml"
|
||||||
|
class_file.write_text(
|
||||||
|
"""classes:
|
||||||
|
base:
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Overview
|
||||||
|
instance:
|
||||||
|
extends:
|
||||||
|
- base
|
||||||
|
slots:
|
||||||
|
sections:
|
||||||
|
- Risks
|
||||||
|
merge_policies:
|
||||||
|
sections: append
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = CliRunner().invoke(main, ["class", "resolve", str(class_file), "instance"])
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "linearization: instance -> base" in result.output
|
||||||
|
assert "Overview" in result.output
|
||||||
|
assert "Risks" in result.output
|
||||||
93
tests/test_explode_implode.py
Normal file
93
tests/test_explode_implode.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.explode import (
|
||||||
|
EXPLODE_MANIFEST_NAME,
|
||||||
|
ExplodeError,
|
||||||
|
explode_markdown_file,
|
||||||
|
implode_markdown_directory,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
ROUNDTRIP_DOC = """---
|
||||||
|
title: Explode Example
|
||||||
|
---
|
||||||
|
|
||||||
|
Opening text before the first heading.
|
||||||
|
|
||||||
|
# Intro
|
||||||
|
|
||||||
|
Intro body.
|
||||||
|
|
||||||
|
## Detail
|
||||||
|
|
||||||
|
Detail body.
|
||||||
|
|
||||||
|
# Later
|
||||||
|
|
||||||
|
Later body.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_flat_explode_implode_roundtrips_exact_markdown(tmp_path: Path):
|
||||||
|
source = tmp_path / "source.md"
|
||||||
|
output_dir = tmp_path / "exploded"
|
||||||
|
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
result = explode_markdown_file(source, output_dir, variant="flat")
|
||||||
|
imploded = implode_markdown_directory(output_dir)
|
||||||
|
|
||||||
|
assert Path(result.manifest_path).name == EXPLODE_MANIFEST_NAME
|
||||||
|
assert (output_dir / "00-preamble.md").exists()
|
||||||
|
assert (output_dir / "sections" / "01-intro.md").exists()
|
||||||
|
assert imploded.markdown == ROUNDTRIP_DOC
|
||||||
|
assert imploded.current_hash == result.manifest.source_hash
|
||||||
|
|
||||||
|
|
||||||
|
def test_hierarchical_explode_places_child_sections_under_parent(tmp_path: Path):
|
||||||
|
source = tmp_path / "source.md"
|
||||||
|
output_dir = tmp_path / "exploded"
|
||||||
|
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
result = explode_markdown_file(source, output_dir, variant="hierarchical")
|
||||||
|
|
||||||
|
files = {Path(path).relative_to(output_dir).as_posix() for path in result.written_files}
|
||||||
|
assert "01-intro.md" in files
|
||||||
|
assert "01-intro/02-detail.md" in files
|
||||||
|
assert implode_markdown_directory(output_dir).markdown == ROUNDTRIP_DOC
|
||||||
|
|
||||||
|
|
||||||
|
def test_explode_rejects_non_empty_output_without_force(tmp_path: Path):
|
||||||
|
source = tmp_path / "source.md"
|
||||||
|
output_dir = tmp_path / "exploded"
|
||||||
|
output_dir.mkdir()
|
||||||
|
(output_dir / "existing.md").write_text("Existing", encoding="utf-8")
|
||||||
|
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
with pytest.raises(ExplodeError, match="not empty"):
|
||||||
|
explode_markdown_file(source, output_dir)
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_explode_and_implode(tmp_path: Path):
|
||||||
|
source = tmp_path / "source.md"
|
||||||
|
output_dir = tmp_path / "exploded"
|
||||||
|
rebuilt = tmp_path / "rebuilt.md"
|
||||||
|
source.write_text(ROUNDTRIP_DOC, encoding="utf-8")
|
||||||
|
runner = CliRunner()
|
||||||
|
|
||||||
|
explode_result = runner.invoke(
|
||||||
|
main,
|
||||||
|
["explode", str(source), "--output-dir", str(output_dir), "--variant", "flat"],
|
||||||
|
)
|
||||||
|
implode_result = runner.invoke(
|
||||||
|
main,
|
||||||
|
["implode", str(output_dir), "--output", str(rebuilt)],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert explode_result.exit_code == 0
|
||||||
|
assert "entries: 4" in explode_result.output
|
||||||
|
assert implode_result.exit_code == 0
|
||||||
|
assert rebuilt.read_text(encoding="utf-8") == ROUNDTRIP_DOC
|
||||||
91
tests/test_literate_weave_tangle.py
Normal file
91
tests/test_literate_weave_tangle.py
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.literate import (
|
||||||
|
discover_code_chunks,
|
||||||
|
tangle_markdown,
|
||||||
|
weave_markdown,
|
||||||
|
write_tangle_files,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
LITERATE_DOC = """# Literate Example
|
||||||
|
|
||||||
|
```python {#helpers}
|
||||||
|
def helper():
|
||||||
|
return "ready"
|
||||||
|
```
|
||||||
|
|
||||||
|
```python {#main tangle="src/app.py"}
|
||||||
|
<<helpers>>
|
||||||
|
|
||||||
|
def main():
|
||||||
|
return helper()
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_discover_code_chunks_with_references_and_targets():
|
||||||
|
chunks = discover_code_chunks(LITERATE_DOC, source_path="example.md")
|
||||||
|
|
||||||
|
assert [chunk.chunk_id for chunk in chunks] == ["helpers", "main"]
|
||||||
|
assert chunks[1].target_path == "src/app.py"
|
||||||
|
assert chunks[1].references == ["helpers"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_tangle_expands_named_chunk_references():
|
||||||
|
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
|
||||||
|
|
||||||
|
assert result.valid
|
||||||
|
assert len(result.files) == 1
|
||||||
|
assert result.files[0].path == "src/app.py"
|
||||||
|
assert "def helper" in result.files[0].content
|
||||||
|
assert "<<helpers>>" not in result.files[0].content
|
||||||
|
assert result.provenance[0].operation == "literate.tangle"
|
||||||
|
|
||||||
|
|
||||||
|
def test_tangle_reports_missing_chunk_reference():
|
||||||
|
markdown = """```python {#main tangle="src/app.py"}
|
||||||
|
<<missing>>
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
result = tangle_markdown(markdown, source_path="example.md")
|
||||||
|
|
||||||
|
assert not result.valid
|
||||||
|
assert result.diagnostics[0].code == "literate.missing_chunk"
|
||||||
|
|
||||||
|
|
||||||
|
def test_weave_appends_chunk_index():
|
||||||
|
result = weave_markdown(LITERATE_DOC, source_path="example.md")
|
||||||
|
|
||||||
|
assert "## Code Chunk Index" in result.markdown
|
||||||
|
assert "`main` -> `src/app.py`; refs: `helpers`" in result.markdown
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_tangle_files(tmp_path: Path):
|
||||||
|
result = tangle_markdown(LITERATE_DOC, source_path="example.md")
|
||||||
|
|
||||||
|
written = write_tangle_files(result, tmp_path)
|
||||||
|
|
||||||
|
assert written == [str(tmp_path / "src" / "app.py")]
|
||||||
|
assert "def main" in (tmp_path / "src" / "app.py").read_text(encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_tangle_and_weave(tmp_path: Path):
|
||||||
|
source = tmp_path / "literate.md"
|
||||||
|
output_dir = tmp_path / "out"
|
||||||
|
woven = tmp_path / "woven.md"
|
||||||
|
source.write_text(LITERATE_DOC, encoding="utf-8")
|
||||||
|
runner = CliRunner()
|
||||||
|
|
||||||
|
tangle_result = runner.invoke(main, ["tangle", str(source), "--output-dir", str(output_dir)])
|
||||||
|
weave_result = runner.invoke(main, ["weave", str(source), "--output", str(woven)])
|
||||||
|
|
||||||
|
assert tangle_result.exit_code == 0
|
||||||
|
assert "files: 1" in tangle_result.output
|
||||||
|
assert (output_dir / "src" / "app.py").exists()
|
||||||
|
assert weave_result.exit_code == 0
|
||||||
|
assert "## Code Chunk Index" in woven.read_text(encoding="utf-8")
|
||||||
@@ -34,6 +34,27 @@ title: Original
|
|||||||
assert "## Intro" in result.markdown
|
assert "## Intro" in result.markdown
|
||||||
assert "### Detail" in result.markdown
|
assert "### Detail" in result.markdown
|
||||||
assert result.operations == ["set_frontmatter", "shift_headings:1"]
|
assert result.operations == ["set_frontmatter", "shift_headings:1"]
|
||||||
|
assert [event.operation for event in result.provenance] == [
|
||||||
|
"set_frontmatter",
|
||||||
|
"shift_headings",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_transform_shifts_headings_without_touching_fenced_code():
|
||||||
|
markdown = """# Intro
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Literal Heading
|
||||||
|
```
|
||||||
|
|
||||||
|
## Real Heading
|
||||||
|
"""
|
||||||
|
|
||||||
|
result = transform_markdown(markdown, heading_delta=1)
|
||||||
|
|
||||||
|
assert "```markdown\n# Literal Heading\n```" in result.markdown
|
||||||
|
assert "### Real Heading" in result.markdown
|
||||||
|
assert result.provenance[0].metadata["affected_lines"] == [1, 7]
|
||||||
|
|
||||||
|
|
||||||
def test_transform_extracts_selector_text():
|
def test_transform_extracts_selector_text():
|
||||||
@@ -104,6 +125,25 @@ def test_resolve_includes_supports_brace_shorthand(tmp_path: Path):
|
|||||||
assert "Before" in result.markdown
|
assert "Before" in result.markdown
|
||||||
assert "Included body." in result.markdown
|
assert "Included body." in result.markdown
|
||||||
assert "After" in result.markdown
|
assert "After" in result.markdown
|
||||||
|
assert result.provenance[0].operation == "include"
|
||||||
|
assert result.provenance[0].target_path == str(partial.resolve())
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_includes_ignores_markers_inside_fenced_code(tmp_path: Path):
|
||||||
|
partial = tmp_path / "partial.md"
|
||||||
|
partial.write_text("Included body.", encoding="utf-8")
|
||||||
|
markdown = """```markdown
|
||||||
|
{{include:partial.md}}
|
||||||
|
```
|
||||||
|
|
||||||
|
{{include:partial.md}}
|
||||||
|
"""
|
||||||
|
|
||||||
|
result = resolve_includes(markdown, base_dir=tmp_path)
|
||||||
|
|
||||||
|
assert result.markdown.count("Included body.") == 1
|
||||||
|
assert "{{include:partial.md}}" in result.markdown
|
||||||
|
assert result.included_paths == [str(partial.resolve())]
|
||||||
|
|
||||||
|
|
||||||
def test_resolve_includes_rejects_cycles(tmp_path: Path):
|
def test_resolve_includes_rejects_cycles(tmp_path: Path):
|
||||||
|
|||||||
105
tests/test_processor_registry.py
Normal file
105
tests/test_processor_registry.py
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.core import parse_markdown
|
||||||
|
from markitect_tool.processor import (
|
||||||
|
ProcessorContext,
|
||||||
|
default_processor_registry,
|
||||||
|
discover_fenced_processors,
|
||||||
|
run_fenced_processors,
|
||||||
|
)
|
||||||
|
from markitect_tool.reference import load_namespaces
|
||||||
|
|
||||||
|
|
||||||
|
def test_discover_fenced_processors_from_language_prefix():
|
||||||
|
markdown = """# Doc
|
||||||
|
|
||||||
|
```mkt-uppercase {#shout}
|
||||||
|
hello
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
blocks = discover_fenced_processors(markdown, source_path="doc.md")
|
||||||
|
|
||||||
|
assert len(blocks) == 1
|
||||||
|
assert blocks[0].processor == "uppercase"
|
||||||
|
assert blocks[0].unit_id == "shout"
|
||||||
|
assert blocks[0].line_start == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_default_registry_runs_uppercase_processor():
|
||||||
|
markdown = """```mkt-uppercase {#shout}
|
||||||
|
hello
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
context = ProcessorContext()
|
||||||
|
|
||||||
|
run = run_fenced_processors(markdown, context=context)
|
||||||
|
|
||||||
|
assert run.valid
|
||||||
|
assert run.results[0].content == "HELLO\n"
|
||||||
|
assert run.results[0].provenance[0].operation == "processor.uppercase"
|
||||||
|
|
||||||
|
|
||||||
|
def test_include_processor_uses_reference_resolver(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
partial = tmp_path / "partial.md"
|
||||||
|
source.write_text(
|
||||||
|
"""---
|
||||||
|
namespaces:
|
||||||
|
local: .
|
||||||
|
---
|
||||||
|
|
||||||
|
```mkt-include {#intro ref="local:partial.md#summary"}
|
||||||
|
```
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
partial.write_text("# Partial\n\n## Summary\n\nIncluded summary.\n", encoding="utf-8")
|
||||||
|
document = parse_markdown(source.read_text(encoding="utf-8"), source_path=str(source))
|
||||||
|
context = ProcessorContext(
|
||||||
|
root=tmp_path,
|
||||||
|
current_path=source,
|
||||||
|
namespaces=load_namespaces(document.frontmatter),
|
||||||
|
)
|
||||||
|
|
||||||
|
run = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
|
||||||
|
|
||||||
|
assert run.valid
|
||||||
|
assert run.results[0].dependencies == [str(partial.resolve())]
|
||||||
|
assert "Included summary" in run.results[0].content
|
||||||
|
|
||||||
|
|
||||||
|
def test_unknown_processor_returns_diagnostic():
|
||||||
|
markdown = """```mkt-nope {#x}
|
||||||
|
content
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
registry = default_processor_registry()
|
||||||
|
|
||||||
|
run = run_fenced_processors(markdown, context=ProcessorContext(), registry=registry)
|
||||||
|
|
||||||
|
assert not run.valid
|
||||||
|
assert run.results[0].diagnostics[0].code == "processor.unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_process_outputs_text(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text(
|
||||||
|
"""# Doc
|
||||||
|
|
||||||
|
```mkt-uppercase {#shout}
|
||||||
|
hello
|
||||||
|
```
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
|
||||||
|
result = CliRunner().invoke(main, ["process", str(source), "--root", str(tmp_path)])
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "valid" in result.output
|
||||||
|
assert "uppercase shout" in result.output
|
||||||
|
assert "HELLO" in result.output
|
||||||
195
tests/test_reference_resolution.py
Normal file
195
tests/test_reference_resolution.py
Normal file
@@ -0,0 +1,195 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.core import parse_markdown
|
||||||
|
from markitect_tool.reference import (
|
||||||
|
ReferenceContext,
|
||||||
|
ReferenceResolutionError,
|
||||||
|
load_namespaces,
|
||||||
|
parse_reference,
|
||||||
|
resolve_reference,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_reference_splits_namespace_fragment_and_selector():
|
||||||
|
address = parse_reference("std:clauses/payment.md#section:fees::blocks[type=code]")
|
||||||
|
|
||||||
|
assert address.namespace == "std"
|
||||||
|
assert address.address == "clauses/payment.md"
|
||||||
|
assert address.fragment == "section:fees"
|
||||||
|
assert address.selector == "blocks[type=code]"
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_namespaces_accepts_optional_colon_suffix():
|
||||||
|
namespaces = load_namespaces({"namespaces": {"std:": "./standard", "src": "../src"}})
|
||||||
|
|
||||||
|
assert namespaces == {"std": "./standard", "src": "../src"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_path_reference_returns_document_unit(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
target_file = tmp_path / "target.md"
|
||||||
|
context_file.write_text("# Context\n", encoding="utf-8")
|
||||||
|
target_file.write_text("---\nid: target-doc\ntitle: Target\n---\n\n# Target\n\nBody.", encoding="utf-8")
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
resolution = resolve_reference("target.md", context=context)
|
||||||
|
|
||||||
|
assert resolution.target_path == str(target_file.resolve())
|
||||||
|
assert len(resolution.units) == 1
|
||||||
|
assert resolution.units[0].kind == "document"
|
||||||
|
assert resolution.units[0].unit_id == "target-doc"
|
||||||
|
assert "# Target" in resolution.units[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_namespace_reference_and_explicit_section_id(tmp_path: Path):
|
||||||
|
standard = tmp_path / "standard"
|
||||||
|
standard.mkdir()
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
clause_file = standard / "clauses.md"
|
||||||
|
context_file.write_text(
|
||||||
|
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
clause_file.write_text(
|
||||||
|
"# Clauses\n\n## Payment Terms {#payment-terms}\n\nPay within 30 days.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
document = parse_markdown(context_file.read_text(encoding="utf-8"), source_path=str(context_file))
|
||||||
|
context = ReferenceContext.from_document(document, root=tmp_path)
|
||||||
|
|
||||||
|
resolution = resolve_reference("std:clauses.md#section:payment-terms", context=context)
|
||||||
|
|
||||||
|
assert resolution.units[0].kind == "section"
|
||||||
|
assert resolution.units[0].unit_id == "payment-terms"
|
||||||
|
assert resolution.units[0].name == "Payment Terms"
|
||||||
|
assert "Pay within 30 days" in resolution.units[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_selector_reference_uses_existing_query_engine(tmp_path: Path):
|
||||||
|
standard = tmp_path / "standard"
|
||||||
|
standard.mkdir()
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
source_file = standard / "clauses.md"
|
||||||
|
context_file.write_text(
|
||||||
|
"---\nnamespaces:\n std: ./standard\n---\n\n# Context\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
source_file.write_text(
|
||||||
|
"# Clauses\n\n## Warranty\n\nWarranty text.\n\n## Liability\n\nLiability text.\n",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
context = ReferenceContext.from_document(parse_markdown(context_file.read_text(encoding="utf-8"), str(context_file)), root=tmp_path)
|
||||||
|
|
||||||
|
resolution = resolve_reference("std:clauses.md::sections[heading=Warranty]", context=context)
|
||||||
|
|
||||||
|
assert [unit.kind for unit in resolution.units] == ["section"]
|
||||||
|
assert resolution.units[0].name == "Warranty"
|
||||||
|
assert "Liability" not in resolution.units[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_pathless_fragment_uses_current_document(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text("# Context\n\n## Overview\n\nUseful local context.\n", encoding="utf-8")
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
resolution = resolve_reference("#overview", context=context)
|
||||||
|
|
||||||
|
assert resolution.target_path == str(context_file.resolve())
|
||||||
|
assert resolution.units[0].kind == "section"
|
||||||
|
assert resolution.units[0].unit_id == "overview"
|
||||||
|
assert "Useful local context" in resolution.units[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_named_region_by_id_and_tag(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text(
|
||||||
|
"""# Context
|
||||||
|
|
||||||
|
<!-- mkt:region id="overview" tags="reuse summary" -->
|
||||||
|
Reusable region text.
|
||||||
|
<!-- /mkt:region -->
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
by_id = resolve_reference("#region:overview", context=context)
|
||||||
|
by_tag = resolve_reference("#tag:summary", context=context)
|
||||||
|
|
||||||
|
assert by_id.units[0].kind == "region"
|
||||||
|
assert by_id.units[0].text == "Reusable region text."
|
||||||
|
assert by_tag.units[0].unit_id == "overview"
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_fenced_block_by_id(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text(
|
||||||
|
"""# Context
|
||||||
|
|
||||||
|
```python {#load-config tags="code setup" tangle="src/config.py"}
|
||||||
|
def load_config():
|
||||||
|
return {}
|
||||||
|
```
|
||||||
|
""",
|
||||||
|
encoding="utf-8",
|
||||||
|
)
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
resolution = resolve_reference("#fence:load-config", context=context)
|
||||||
|
|
||||||
|
assert resolution.units[0].kind == "fenced_block"
|
||||||
|
assert resolution.units[0].unit_id == "load-config"
|
||||||
|
assert resolution.units[0].metadata["language"] == "python"
|
||||||
|
assert resolution.units[0].metadata["attrs"]["tangle"] == "src/config.py"
|
||||||
|
assert "def load_config" in resolution.units[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_line_range_fragment(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text("# Context\n\nLine A\nLine B\nLine C\n", encoding="utf-8")
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
resolution = resolve_reference("#line:3-4", context=context)
|
||||||
|
|
||||||
|
assert resolution.units[0].kind == "line_range"
|
||||||
|
assert resolution.units[0].span.line_start == 3
|
||||||
|
assert resolution.units[0].text == "Line A\nLine B"
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_rejects_unknown_namespace(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text("# Context\n", encoding="utf-8")
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
with pytest.raises(ReferenceResolutionError, match="Unknown namespace"):
|
||||||
|
resolve_reference("missing:doc.md", context=context)
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_rejects_paths_outside_root(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
context_file.write_text("# Context\n", encoding="utf-8")
|
||||||
|
context = ReferenceContext(root=tmp_path, current_path=context_file)
|
||||||
|
|
||||||
|
with pytest.raises(ReferenceResolutionError, match="escapes root"):
|
||||||
|
resolve_reference("../outside.md", context=context)
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_ref_resolve_outputs_text(tmp_path: Path):
|
||||||
|
context_file = tmp_path / "context.md"
|
||||||
|
target_file = tmp_path / "target.md"
|
||||||
|
context_file.write_text("# Context\n", encoding="utf-8")
|
||||||
|
target_file.write_text("# Target\n\n## Decision\n\nChosen.", encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main,
|
||||||
|
["ref", "resolve", str(context_file), "target.md#decision", "--root", str(tmp_path)],
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "1 unit(s)" in result.output
|
||||||
|
assert "section decision" in result.output
|
||||||
|
assert "Decision" in result.output
|
||||||
60
tests/test_wp0010_migration_examples.py
Normal file
60
tests/test_wp0010_migration_examples.py
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from markitect_tool.core import parse_markdown_file
|
||||||
|
from markitect_tool.explode import explode_markdown_file, implode_markdown_directory
|
||||||
|
from markitect_tool.ops import resolve_includes
|
||||||
|
from markitect_tool.processor import ProcessorContext, run_fenced_processors
|
||||||
|
from markitect_tool.reference import load_namespaces
|
||||||
|
from markitect_tool.literate import tangle_markdown
|
||||||
|
|
||||||
|
|
||||||
|
EXAMPLES = Path("examples/migration")
|
||||||
|
|
||||||
|
|
||||||
|
def test_migration_explode_example_roundtrips(tmp_path: Path):
|
||||||
|
source = EXAMPLES / "legacy-explode-source.md"
|
||||||
|
original = source.read_text(encoding="utf-8")
|
||||||
|
|
||||||
|
explode_markdown_file(source, tmp_path / "exploded", variant="hierarchical")
|
||||||
|
result = implode_markdown_directory(tmp_path / "exploded")
|
||||||
|
|
||||||
|
assert result.markdown == original
|
||||||
|
|
||||||
|
|
||||||
|
def test_migration_reference_backed_transclusion_example():
|
||||||
|
source = EXAMPLES / "legacy-transclusion-context.md"
|
||||||
|
document = parse_markdown_file(source)
|
||||||
|
context = ProcessorContext(
|
||||||
|
root=EXAMPLES,
|
||||||
|
current_path=source,
|
||||||
|
namespaces=load_namespaces(document.frontmatter),
|
||||||
|
)
|
||||||
|
|
||||||
|
result = run_fenced_processors(source.read_text(encoding="utf-8"), context=context)
|
||||||
|
|
||||||
|
assert result.valid
|
||||||
|
assert "Payment is due within 30 days" in result.results[0].content
|
||||||
|
|
||||||
|
|
||||||
|
def test_migration_path_include_example():
|
||||||
|
source = EXAMPLES / "legacy-path-include.md"
|
||||||
|
|
||||||
|
result = resolve_includes(
|
||||||
|
source.read_text(encoding="utf-8"),
|
||||||
|
base_dir=EXAMPLES,
|
||||||
|
current_path=source,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "## Warranty" in result.markdown
|
||||||
|
assert "Warranty begins on the effective date" in result.markdown
|
||||||
|
|
||||||
|
|
||||||
|
def test_migration_literate_example_tangles():
|
||||||
|
source = EXAMPLES / "legacy-literate.md"
|
||||||
|
|
||||||
|
result = tangle_markdown(source.read_text(encoding="utf-8"), source_path=source)
|
||||||
|
|
||||||
|
assert result.valid
|
||||||
|
assert result.files[0].path == "src/app.py"
|
||||||
|
assert "CONFIG" in result.files[0].content
|
||||||
|
assert "<<config>>" not in result.files[0].content
|
||||||
@@ -3,7 +3,7 @@ id: MKTT-WP-0010
|
|||||||
type: workplan
|
type: workplan
|
||||||
title: "Content References, Processors, and Literate Workflows"
|
title: "Content References, Processors, and Literate Workflows"
|
||||||
domain: markitect
|
domain: markitect
|
||||||
status: todo
|
status: done
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
planning_priority: P1
|
planning_priority: P1
|
||||||
@@ -55,7 +55,7 @@ See `docs/content-reference-literate-workflow-research.md`.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T001
|
id: MKTT-WP-0010-T001
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "f70d2b9d-151b-46c6-9613-bd6bdbf164e7"
|
state_hub_task_id: "f70d2b9d-151b-46c6-9613-bd6bdbf164e7"
|
||||||
```
|
```
|
||||||
@@ -66,11 +66,18 @@ resolver inputs/outputs, and error cases.
|
|||||||
Output: reference model docs, examples, and tests for path, namespace, selector,
|
Output: reference model docs, examples, and tests for path, namespace, selector,
|
||||||
and ID resolution.
|
and ID resolution.
|
||||||
|
|
||||||
|
Initial implementation completed with a `reference` extension package,
|
||||||
|
frontmatter namespace loading, root-bounded path resolution, existing query
|
||||||
|
selector reuse, heading/section/block fragment IDs, CLI access via
|
||||||
|
`mkt ref resolve`, reference docs, examples, and tests. Region/tag/fenced-block
|
||||||
|
addressing continues in P10.3; processor dependency/provenance use continues in
|
||||||
|
P10.2 and P10.5.
|
||||||
|
|
||||||
## P10.2 - Add token-safe transforms and operation provenance
|
## P10.2 - Add token-safe transforms and operation provenance
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T002
|
id: MKTT-WP-0010-T002
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "e35639b7-756f-4993-8b3c-2e58b23e0eca"
|
state_hub_task_id: "e35639b7-756f-4993-8b3c-2e58b23e0eca"
|
||||||
```
|
```
|
||||||
@@ -80,11 +87,17 @@ structured operation provenance, dependency edges, source spans, and diagnostics
|
|||||||
|
|
||||||
Output: token-safe transform implementation and provenance result envelope.
|
Output: token-safe transform implementation and provenance result envelope.
|
||||||
|
|
||||||
|
Initial implementation completed with token-safe heading shifts, include
|
||||||
|
markers that stay literal inside fenced or indented code blocks, additive
|
||||||
|
`OperationProvenance` events on transform/include results, dependency edges for
|
||||||
|
resolved includes, docs, and regression tests. Rich structured diagnostics and
|
||||||
|
source maps continue through P10.3, P10.4, and P10.5.
|
||||||
|
|
||||||
## P10.3 - Implement named regions and addressable block selectors
|
## P10.3 - Implement named regions and addressable block selectors
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T003
|
id: MKTT-WP-0010-T003
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "98cafe28-a364-48f1-ae55-cb47c71d9441"
|
state_hub_task_id: "98cafe28-a364-48f1-ae55-cb47c71d9441"
|
||||||
```
|
```
|
||||||
@@ -94,11 +107,17 @@ selection by ID/tag/line range where appropriate.
|
|||||||
|
|
||||||
Output: region parser/resolver, CLI examples, and source-snippet tests.
|
Output: region parser/resolver, CLI examples, and source-snippet tests.
|
||||||
|
|
||||||
|
Initial implementation completed as reference-layer extensions: named
|
||||||
|
`mkt:region` comments, region tags, fenced-block IDs and tags from info-string
|
||||||
|
attributes, `#line:start-end` ranges, convenience ID lookup ordering, docs,
|
||||||
|
examples, and tests. Deeper source maps and processor-owned block semantics
|
||||||
|
continue in P10.5 and P10.6.
|
||||||
|
|
||||||
## P10.4 - Reimplement reversible explode/implode variants
|
## P10.4 - Reimplement reversible explode/implode variants
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T004
|
id: MKTT-WP-0010-T004
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "67f77aa1-a7ee-485c-891e-6ae7ecc52067"
|
state_hub_task_id: "67f77aa1-a7ee-485c-891e-6ae7ecc52067"
|
||||||
```
|
```
|
||||||
@@ -111,11 +130,16 @@ reference and processor model is stable.
|
|||||||
|
|
||||||
Output: `mkt explode`, `mkt implode`, manifest schema, roundtrip tests.
|
Output: `mkt explode`, `mkt implode`, manifest schema, roundtrip tests.
|
||||||
|
|
||||||
|
Initial implementation completed with a separate `explode` extension package,
|
||||||
|
manifest-first flat and hierarchical variants, exact roundtrip implode,
|
||||||
|
non-empty output protection, CLI commands, docs, and tests. Semantic variants
|
||||||
|
remain deferred until processor and content-class semantics are stable.
|
||||||
|
|
||||||
## P10.5 - Define processor registry for fenced blocks
|
## P10.5 - Define processor registry for fenced blocks
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T005
|
id: MKTT-WP-0010-T005
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "eb7cde08-8a73-4163-ac54-19a2bc7b5f88"
|
state_hub_task_id: "eb7cde08-8a73-4163-ac54-19a2bc7b5f88"
|
||||||
```
|
```
|
||||||
@@ -126,11 +150,18 @@ and return generated content/files, diagnostics, dependencies, and provenance.
|
|||||||
|
|
||||||
Output: processor registry API, deterministic built-in processors, and tests.
|
Output: processor registry API, deterministic built-in processors, and tests.
|
||||||
|
|
||||||
|
Initial implementation completed with a deterministic `processor` extension
|
||||||
|
package, fenced-block discovery, explicit registry, context/policy envelope,
|
||||||
|
result files/diagnostics/dependencies/provenance, built-in identity,
|
||||||
|
uppercase, and reference-backed include processors, CLI `mkt process`, docs,
|
||||||
|
examples, and tests. Arbitrary code or LLM execution remains intentionally
|
||||||
|
outside this deterministic registry floor.
|
||||||
|
|
||||||
## P10.6 - Implement literate weave/tangle MVP
|
## P10.6 - Implement literate weave/tangle MVP
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T006
|
id: MKTT-WP-0010-T006
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "090fcc38-758b-4414-b941-40f217eb17ca"
|
state_hub_task_id: "090fcc38-758b-4414-b941-40f217eb17ca"
|
||||||
```
|
```
|
||||||
@@ -141,11 +172,16 @@ cross-references.
|
|||||||
|
|
||||||
Output: `mkt tangle`, `mkt weave`, chunk-reference diagnostics, examples.
|
Output: `mkt tangle`, `mkt weave`, chunk-reference diagnostics, examples.
|
||||||
|
|
||||||
|
Initial implementation completed with a `literate` extension package, named
|
||||||
|
fenced code chunks, `tangle` targets, noweb-style `<<chunk-id>>` expansion,
|
||||||
|
missing/cyclic chunk diagnostics, deterministic file writing, woven chunk
|
||||||
|
index output, CLI `mkt tangle`/`mkt weave`, docs, examples, and tests.
|
||||||
|
|
||||||
## P10.7 - Design content class composition and multi-inheritance
|
## P10.7 - Design content class composition and multi-inheritance
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T007
|
id: MKTT-WP-0010-T007
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "220e6b27-2d7b-4c22-b5e8-304198ecfea8"
|
state_hub_task_id: "220e6b27-2d7b-4c22-b5e8-304198ecfea8"
|
||||||
```
|
```
|
||||||
@@ -156,11 +192,16 @@ diagnostics.
|
|||||||
|
|
||||||
Output: architecture note, examples, and a small deterministic resolver spike.
|
Output: architecture note, examples, and a small deterministic resolver spike.
|
||||||
|
|
||||||
|
Initial implementation completed with a `content_class` extension package,
|
||||||
|
C3-style deterministic linearization, explicit slot merge policies, conflict
|
||||||
|
diagnostics, CLI `mkt class resolve`, docs, examples, and tests. Markdown
|
||||||
|
instantiation and snippet injection remain deferred to later integration work.
|
||||||
|
|
||||||
## P10.8 - Add migration examples from markitect-main
|
## P10.8 - Add migration examples from markitect-main
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0010-T008
|
id: MKTT-WP-0010-T008
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "287637d3-1997-43b2-b97d-10587d565cec"
|
state_hub_task_id: "287637d3-1997-43b2-b97d-10587d565cec"
|
||||||
```
|
```
|
||||||
@@ -169,3 +210,9 @@ Translate the relevant old explode/implode, transclusion, and spaces reference
|
|||||||
graph tests into successor-style fixtures and examples.
|
graph tests into successor-style fixtures and examples.
|
||||||
|
|
||||||
Output: migration test inventory, example documents, and parity notes.
|
Output: migration test inventory, example documents, and parity notes.
|
||||||
|
|
||||||
|
Initial implementation completed with WP-0010 migration parity notes,
|
||||||
|
successor-style examples for explode/implode, path include, reference-backed
|
||||||
|
transclusion, and literate tangling, plus tests that exercise these examples.
|
||||||
|
Legacy platform, database, infospace, rendering, and provider-specific
|
||||||
|
behaviors remain intentionally out of scope.
|
||||||
|
|||||||
Reference in New Issue
Block a user