coulomb/markitect-tool

Fork 0

generated from coulomb/repo-seed

Files

tegwick 4f010315bb Enhanced usecase example weave tangle for later workflows

2026-05-04 00:54:08 +02:00

10 KiB

Raw Blame History

Content References, Processors, and Literate Workflows

Date: 2026-05-04

Purpose

This note records the follow-up research after the first transform, compose, and include implementation. The goal is to keep markitect-tool close to Markdown while preserving the richer ideas that made markitect-main interesting: reversible explode/implode, transclusion, processors, namespaces, content references, and Knuth-style weave/tangle workflows.

Research Inputs

WEB on CTAN and Knuth/Levy CWEB: literate source is processed in two directions, one for compilable source and one for readable documentation.
noweb Hacker's Guide: language-independent literate programming benefits from a pipeline representation and named chunks that tools can extend.
Org Babel: source blocks can be executable, parameterized, named, reused, tangled to files, and woven into reproducible documents.
CommonMark fenced code blocks: fenced blocks are first-class Markdown structure and must be handled by the parser, not by naive global text rewrites.
Asciidoctor include directives and tagged regions: includes need predictable base-dir resolution, safe-mode boundaries, line and tag selection, and source-code-region reuse.
Sphinx literalinclude: code inclusion commonly needs line ranges, object-level extraction, highlighting metadata, dedent, and original line-number handling.
DITA conref and conkeyref: content reuse becomes much stronger when references have IDs, keys, scoped indirection, validity checks, and clear attribute merge rules.
W3C XInclude: inclusion should have an explicit processing model, target addressing, and fallback behavior.
JSON-LD 1.1 contexts: namespaces can map short terms to stable global identifiers while retaining compact authoring.
Python C3 MRO and CLOS concepts: multiple inheritance needs deterministic linearization, monotonicity, local precedence, and explicit method/slot combination rules.
Pandoc filters: processors can be cleanly modeled as AST transformations over document nodes and code blocks.

Lessons

Markdown can carry a surprisingly rich system if the extra semantics are placed in stable, inspectable constructs:

Frontmatter declares document-level identity, namespaces, and defaults.
Headings and fenced blocks become addressable content units.
Include/transclusion is a resolver over content references, not only file expansion.
Processors operate on typed blocks and produce diagnostics, dependencies, generated content, or files.
Weave/tangle is a special case of named content units plus processor targets.
Explode/implode needs a manifest with source spans and stable IDs so the directory form is not a lossy export.
Multiple inheritance is useful for document templates, regulatory overlays, style/persona overlays, and reusable content classes, but only if merge behavior is deterministic and diagnosable.

Use Cases

1. Reversible Large-Document Editing

An author explodes a long PRD/FRS into a directory, edits sections in separate files, then implodes it back into a canonical single Markdown document. The manifest preserves frontmatter policy, heading levels, ordering, source spans, and generated filenames.

2. Knuth-Style Markdown Weave/Tangle

A document explains a program in the order best for human understanding. Named code chunks are declared in fenced blocks, cross-reference each other, and tangle into one or more source files. The woven output keeps prose, chunk cross-links, and optionally generated indexes.

3. Executable Documentation Pipelines

Fenced blocks act as processors: shell, Python, SQL, validation, diagram, or custom processors can consume inputs, emit outputs, and record dependencies. Execution is optional and controlled; pure transforms remain deterministic.

4. Reusable Legal, Contract, and Product Clauses

Common clauses are defined once with stable IDs. Documents include them by namespace/key and can select variants by jurisdiction, customer type, language, or document class. Diagnostics explain missing keys and conflicting variants.

5. Source Snippet Documentation

Docs include code by tag, line range, parser object, or named block while preserving source line references. This supports API docs, changelog examples, and tutorials that stay aligned with source files.

6. Content Classes and Multiple Inheritance

A document can be treated as an instance of several content classes: for example base:prd, market:enterprise, jurisdiction:eu, and style:board-brief. Slot values, assertions, sections, and snippets resolve in a deterministic order with explicit merge strategies.

7. Agent Context Packages

An agent can request a namespace, topic, chunk, section, or graph slice and get a bounded context package with provenance, dependencies, hashes, and security labels. This dovetails with later cache and memory work.

8. Security-Sensitive Knowledge Gateways

References and processor outputs carry labels. Policy can filter or redact content before transclusion, weaving, tangling, or context-package creation.

Architecture Blueprint

Content Unit Model

The parser should expose addressable units beyond the current document, section, and block lists:

document
frontmatter path
section
block
fenced block
named region
named chunk
processor result

Each unit should have:

stable local ID
optional global name
source path and source span
kind/type
content hash
dependency list
labels/policy metadata

Reference Syntax

Keep Markdown readable and allow several levels of precision:

<!-- mkt:include ref="std:clauses/payment" -->
<!-- mkt:include path="sections/intro.md" selector="sections[heading=Summary]" -->
<!-- mkt:include ref="src:api#tag:create-user" mode="literal" -->

Frontmatter can define namespaces:

namespaces:
  std: ./standards/
  src: ../src/
  contract: ./contracts/

References should resolve through a single resolver API:

namespace + address + selector + mode + context -> resolved content unit(s)

Region and Chunk Syntax

Use comments for regions so they can live inside Markdown or source files:

<!-- mkt:region id="overview" -->
Reusable content.
<!-- /mkt:region -->

Use fenced blocks for executable or tangible chunks:

```python {#load-config tangle="src/config.py"}
def load_config(path):
    return {}
```

Chunk references can stay close to noweb:

<<load-config>>

The processor layer decides whether chunk references are expanded during tangle, displayed during weave, or left literal.

Processor Registry

Processors should be pluggable but explicit. A processor receives:

unit content and metadata
resolver
execution context
policy context
output target request

It returns:

transformed content, generated files, or computed values
diagnostics
dependency edges
provenance events

Core processors should start deterministic: include, region, explode/implode, tangle, weave, and simple text/Markdown transforms. Executing arbitrary code is a later, opt-in capability.

Explode/Implode

Explode/implode should become a first-class reversible operation, not a loose directory export. The manifest should include:

original path and hash
variant type (flat, hierarchical, semantic)
frontmatter preservation policy
section/chunk/source-span entries
file paths and order
heading-level policy
warnings and non-lossy roundtrip checks

The old markitect-main flat/hierarchical/semantic variants are worth reimplementing behind a small variant interface.

Weave/Tangle

Tangle extracts named chunks to target files, expanding chunk references in a deterministic dependency order. Weave renders human-readable documentation with chunk backlinks and optional source indexes.

Minimum useful MVP:

discover named fenced blocks
support tangle="<path>"
concatenate multiple chunks for the same target in document order
expand <<chunk-id>> inside code
detect missing/cyclic chunk references
emit source mapping comments optionally

Content Class and Multiple Inheritance

Document classes should be data, not Python inheritance. A class can define:

slots
required sections
snippets
assertions
processors
merge policies

An instance declares:

document_class:
  extends:
    - contract:prd
    - market:enterprise
    - jurisdiction:eu

Resolution should use a C3-like linearization. Merge policies must be explicit:

replace
append
prepend
deep_merge
before:<slot>
after:<slot>
error_on_conflict

Diagnostics should report inconsistent precedence, ambiguous slot definitions, and merge-policy violations.

Comparison with Current Implementation

What we have now is a good kernel:

Parser/frontmatter/sections/blocks
Contracts and deterministic diagnostics
Query/extraction over structured documents
Transform, compose, and include operations
Safe include path boundaries and cycle checks

What is missing for the richer framework:

stable content IDs and namespaces
region/tag selectors
fenced-block-aware transforms
operation provenance and dependency graphs
structured include diagnostics instead of fail-fast exceptions only
reversible explode/implode with manifests
processor registry
named chunks and weave/tangle
class/object composition with deterministic multi-inheritance
line/source maps across generated outputs
security labels and policy hooks on resolved units

The clean path is to keep current ops as the small deterministic surface and grow this richer system as a framework layer. That protects simple CLI use while opening a strong route to sophisticated knowledge/programming pipelines.

10 KiB Raw Blame History