Files
tegwick d4afce3699 research: literate programming deep dive (WEB/weave/tangle); UC-83 (SHARD-WP-0004 T1)
One source → N co-equal derived projections (weave=docs, tangle=code);
named-chunk transclusion; splits replication- vs derivation-projection.
Generalizes compile-to-static (UC-79). Enriches UC-32/44/79; links UC-54.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 22:43:30 +02:00

11 KiB

Literate Programming (Knuth's WEB / weave / tangle) — deep dive (findings)

Date: 2026-06-14 · Source: SHARD-WP-0004 T1 · Subject: Donald Knuth's WEB system and the literate-programming model (weave/tangle, named chunks).

Why this dive

SHARD-WP-0004 asks the carried question: can a shard-wiki page be a source that is woven/evaluated into rendered forms, and how do projection/transclusion/provenance treat the source vs the output? Literate programming is the deepest ancestor of that idea. Knuth (1984) inverted the program/comment relationship: you write a document for humans whose fragments happen to also be the program. From the one WEB source two tools derive two artifacts: weave → typeset documentation (TeX) and tangle → compilable code (Pascal/C/…). This is one source, two projections in its purest, oldest form — the conceptual root of shard-wiki's projection and transclusion.

1. The WEB model: one source, two tools

A WEB file interleaves prose and code in author-chosen order (the order that best explains, not the order the compiler needs). Two programs read it:

  • weave produces a .tex file → typeset documentation: prose, pretty-printed code, a cross-reference index of where each chunk and identifier is defined/used.
  • tangle produces a compilable source file → reorders and expands the code chunks into the sequence the compiler demands, strips the prose, macro-expands references.

The crucial property: the two outputs are co-derived from one canonical source and are semantically different audiences (human reader vs compiler). Neither output is the source; both are regenerable, disposable derivations. Editing happens on the WEB; you never edit the woven .tex or the tangled .c.

2. Named chunks = transclusion of code fragments

The organizing primitive is the named section / code chunk:

  • A chunk is declared <<name>>= and referenced elsewhere as <<name>>. tangle expands references in place (recursively) to assemble the final program.
  • A chunk name can be defined in multiple places and is accreted (later += additions append) — so one logical unit is authored across scattered locations.
  • References can appear before definitions; resolution is by name, not by position.

This is transclusion by reference (UC-32) and compose-by-reference / manifest (UC-44, the EDL/Xanadu lineage): the document is a graph of named fragments assembled at derivation time. Knuth's chunk graph is the same shape as Xanadu's reference-not-copy and shard-wiki's "compose a page from referenced parts" — applied to executable content and resolved by a build tool rather than a viewer.

3. The descendants (noweb, CWEB, org-babel, Sweave/knitr, Jupytext)

  • CWEB (Knuth/Levy): WEB for C. noweb (Ramsey): language-agnostic, minimal markup (<<chunk>>), notangle/noweave — proof that the model (chunks + two projections) is separable from any one language or typesetter.
  • org-babel (Emacs Org-mode): named source blocks, :noweb references, tangle to files and evaluate blocks inline (results captured back into the document) — literate programming fused with notebook execution (bridges to T2/T3).
  • Sweave / knitr (R): weave prose + R, executing code and interleaving computed results/figures into the woven document — adds the computed-output dimension that Jupyter (T3) centers on.
  • Jupytext: represents a Jupyter notebook as a literate text/Markdown source — closing the loop: the notebook (T3) becomes a weave/tangle-style plain-text source.

The throughline: one canonical source → N derived projections, where projections may be (a) reformatted (weave), (b) reordered/extracted (tangle), or (c) evaluated (org-babel/knitr) — the evaluated case is exactly the computational-page question.

4. Capability profile (as a would-be shard / page type)

Dimension (synthesis spectrum) Literate-programming source
Attachment mode file-store (a WEB/.nw/.org text source in a repo)
Addressing granularity document; named chunk as sub-page address
Content identity source file path + chunk name (name-resolved, not position)
Structure graph of named chunks assembled by reference
History whatever VCS holds the source (git) — text, diffable
Merge model text/git merge on the source
Native query none; weave emits a cross-reference index (derived)
Translation source → woven docs and tangled code (build-time)
Write granularity file / chunk (text region)
Operational envelope a build tool (tangle/weave/noweb/babel)
Content opacity transparent text
Provenance VCS author/time; chunk cross-ref maps output→source location
Projection model one source → many co-equal derived projections (new emphasis)

5. INTENT mapping

Reinforcements

  • Projection (canonical vs derived). Literate programming is the archetype of "canonical source, regenerable derived view" — the same principle as ikiwiki compile-to-static (UC-79), but generalized: two-plus co-equal projections from one source (docs and code), not a single output. Strengthens the rule never confuse a projection for the source.
  • Transclusion / compose-by-reference. Named chunks are transclusion (UC-32) and a manifest of referenced parts (UC-44) — resolved at derivation time. Confirms transclusion=clone=embed=reference as one primitive that also covers fragment assembly of executable content.
  • Markdown-first but backend-neutral page model. noweb/org/Jupytext show the literate source can be Markdown-ish plain text — so a "literate page" fits the text-first model; the derivations are the non-text part.
  • Mechanism over policy. weave/tangle are mechanisms; which projections to materialize, when to regenerate, and where outputs go stay configurable.

Divergences / boundaries

  • shard-wiki is not a build system. It should recognize and present a source-with-projections (attach the source, surface derived views with provenance), not re-implement tangle/kernels. Materializing a projection may delegate to the source's own tool or be capability-gated to "snapshot only."
  • The interesting projection is derivation, not caching. shard-wiki's base projection is cache-like (lazy copy of remote content, UC-53/57). Weave/tangle is a different projection species: transform/derive one source into rendered forms. The contract should model projection as having (at least) two kinds: replication-projection and derivation-projection (compile/weave/evaluate).

What to keep

  1. One-source-many-projections as a first-class page-model + projection pattern (generalizes UC-79's single output) — see UC-83.
  2. Named-chunk transclusion as the executable-content face of UC-32/UC-44 (assembly by reference at derivation time).
  3. Output→source provenance (the woven cross-ref index): every derived view must point back to the exact source location it came from — never present derived output without that link (union-without-erasure for derivations).

6. UC seed

# Seed Disposition
UC-83 Attach a single-source-multiple-projection artifact (a literate/woven source): treat the source as canonical, present each derived projection (e.g. a documentation view and a code view) with provenance back to the one source, edits target the source and projections regenerate (delegated to the source's tool or degraded to a static snapshot) new
Named chunks <<name>> = transclusion / compose-by-reference of (executable) fragments enrich UC-32, UC-44
Generalize compile-to-static (single output) to N co-equal projections from one source enrich UC-79
Computed/evaluated projection (org-babel/knitr) = derivation-projection with results links UC-54, foreshadows T3

7. Architecture notes for SHARD-WP-0002

  • T12 (page model): add one-source-many-projections as a page-model shape — a page may be a source whose presented forms are derivations (woven docs, tangled code, evaluated results), each carrying output→source provenance. Distinct from prose, typed records, query-defined, and inline-embedded objects already logged.
  • T16 (projection): split projection into replication-projection (lazy cache of remote content — current default) vs derivation-projection (transform/compile/weave/ evaluate a source into rendered forms). Derivation-projection is regenerable, may delegate to an external tool, and degrades to a captured snapshot when the tool is absent (graceful degradation).
  • Transclusion (T16): named-chunk-by-name resolution is a transclusion variant where the target is a fragment resolved by name across the source, assembled at derivation time — a concrete shape for UC-32/UC-44 mechanics.
  • Capability gating: "can derive projection X" is a capability; a shard that can't run the tool still exposes the source + any pre-built/snapshot projections (UC-83 degrade).

8. Open questions

  1. Does shard-wiki ever drive a derivation (run weave/tangle/evaluate), or only attach sources and surface pre-built projections + snapshots? (Same scope question as UC-56 "do we ever compile-to-static ourselves," now for literate sources.)
  2. Is UC-83 distinct enough from UC-79 (compile-to-static) to stand alone, or should UC-79 be re-read as the single-output special case of UC-83's N-projection general case? (Recorded as a possible later consolidation; kept separate now because UC-83's projections are co-equal and semantically different audiences, not one publish target.)
  3. How is output→source provenance represented when a derived line came from a chunk accreted across several source locations (the cross-ref is many-to-one)?

9. Sources

  • Knuth, Literate Programming (1984, Computer Journal); the WEB user manual; TeX: The Program / MMIX as canonical WEB exemplars.
  • Ramsey, noweb (a simple, extensible literate-programming tool).
  • CWEB (Knuth & Levy); Emacs Org-mode babel (tangle + evaluate); Sweave/knitr (R); Jupytext (notebook-as-text).
  • prior: research/260614-ikiwiki-deep-dive/ (compile-to-static, canonical-vs-derived); research/260614-xanadu-deep-dive/ (compose-by-reference / EDL, UC-44).

10. Traceability

New UC UC-83 carries the marker in the federation column of spec/UseCaseCatalog.md (true lineage = this dive; literate programming is design prior art, not a candidate shard, so the marker sits with the projection/compose-by-reference family). Enriched: UC-32, UC-44, UC-79; links UC-54. Architecture cross-refs: SHARD-WP-0002 T12 (one-source-many-projections page shape), T16 (replication- vs derivation-projection; named-chunk transclusion).