generated from coulomb/repo-seed
One source → N co-equal derived projections (weave=docs, tangle=code); named-chunk transclusion; splits replication- vs derivation-projection. Generalizes compile-to-static (UC-79). Enriches UC-32/44/79; links UC-54. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
181 lines
11 KiB
Markdown
181 lines
11 KiB
Markdown
# Literate Programming (Knuth's WEB / weave / tangle) — deep dive (findings)
|
|
|
|
**Date:** 2026-06-14 · **Source:** SHARD-WP-0004 T1 · **Subject:** Donald Knuth's WEB
|
|
system and the literate-programming model (`weave`/`tangle`, named chunks).
|
|
|
|
## Why this dive
|
|
|
|
SHARD-WP-0004 asks the carried question: *can a shard-wiki page be a source that is
|
|
woven/evaluated into rendered forms, and how do projection/transclusion/provenance treat
|
|
the source vs the output?* Literate programming is the **deepest ancestor** of that idea.
|
|
Knuth (1984) inverted the program/comment relationship: you write a **document** for
|
|
humans whose fragments happen to also be the program. From the **one WEB source** two
|
|
tools derive two artifacts: **`weave` → typeset documentation** (TeX) and **`tangle` →
|
|
compilable code** (Pascal/C/…). This is *one source, two projections* in its purest,
|
|
oldest form — the conceptual root of shard-wiki's **projection** and **transclusion**.
|
|
|
|
## 1. The WEB model: one source, two tools
|
|
|
|
A WEB file interleaves prose and code in author-chosen order (the order that best
|
|
*explains*, not the order the compiler needs). Two programs read it:
|
|
|
|
- **`weave`** produces a `.tex` file → typeset documentation: prose, pretty-printed code,
|
|
a cross-reference index of where each chunk and identifier is defined/used.
|
|
- **`tangle`** produces a compilable source file → reorders and expands the code chunks
|
|
into the sequence the compiler demands, strips the prose, macro-expands references.
|
|
|
|
The crucial property: **the two outputs are co-derived from one canonical source and are
|
|
semantically different audiences** (human reader vs compiler). Neither output is the
|
|
source; both are **regenerable, disposable derivations**. Editing happens on the WEB; you
|
|
never edit the woven `.tex` or the tangled `.c`.
|
|
|
|
## 2. Named chunks = transclusion of code fragments
|
|
|
|
The organizing primitive is the **named section / code chunk**:
|
|
|
|
- A chunk is declared `<<name>>=` and **referenced** elsewhere as `<<name>>`. `tangle`
|
|
expands references in place (recursively) to assemble the final program.
|
|
- A chunk name can be **defined in multiple places** and is *accreted* (later `+=`
|
|
additions append) — so one logical unit is authored across scattered locations.
|
|
- References can appear before definitions; resolution is by name, not by position.
|
|
|
|
This is **transclusion by reference** (UC-32) and **compose-by-reference / manifest**
|
|
(UC-44, the EDL/Xanadu lineage): the document is a graph of named fragments assembled at
|
|
derivation time. Knuth's chunk graph is the same shape as Xanadu's reference-not-copy and
|
|
shard-wiki's "compose a page from referenced parts" — applied to *executable* content and
|
|
resolved by a build tool rather than a viewer.
|
|
|
|
## 3. The descendants (noweb, CWEB, org-babel, Sweave/knitr, Jupytext)
|
|
|
|
- **CWEB** (Knuth/Levy): WEB for C. **noweb** (Ramsey): language-agnostic, minimal markup
|
|
(`<<chunk>>`), `notangle`/`noweave` — proof that the *model* (chunks + two projections)
|
|
is separable from any one language or typesetter.
|
|
- **org-babel** (Emacs Org-mode): named source blocks, `:noweb` references, **tangle** to
|
|
files **and evaluate** blocks inline (results captured back into the document) — literate
|
|
programming fused with notebook execution (bridges to T2/T3).
|
|
- **Sweave / knitr** (R): weave prose + R, executing code and **interleaving computed
|
|
results/figures** into the woven document — adds the *computed-output* dimension that
|
|
Jupyter (T3) centers on.
|
|
- **Jupytext**: represents a Jupyter notebook **as** a literate text/Markdown source —
|
|
closing the loop: the notebook (T3) becomes a weave/tangle-style plain-text source.
|
|
|
|
The throughline: **one canonical source → N derived projections**, where projections may
|
|
be (a) reformatted (weave), (b) reordered/extracted (tangle), or (c) **evaluated**
|
|
(org-babel/knitr) — the evaluated case is exactly the computational-page question.
|
|
|
|
## 4. Capability profile (as a would-be shard / page type)
|
|
|
|
| Dimension (synthesis spectrum) | Literate-programming source |
|
|
|--------------------------------|-----------------------------|
|
|
| Attachment mode | file-store (a WEB/`.nw`/`.org` text source in a repo) |
|
|
| Addressing granularity | document; **named chunk** as sub-page address |
|
|
| Content identity | source file path + chunk name (name-resolved, not position) |
|
|
| Structure | **graph of named chunks** assembled by reference |
|
|
| History | whatever VCS holds the source (git) — text, diffable |
|
|
| Merge model | text/git merge on the source |
|
|
| Native query | none; `weave` emits a cross-reference index (derived) |
|
|
| Translation | source → woven docs **and** tangled code (build-time) |
|
|
| Write granularity | file / **chunk** (text region) |
|
|
| Operational envelope | a build tool (`tangle`/`weave`/`noweb`/babel) |
|
|
| Content opacity | transparent text |
|
|
| Provenance | VCS author/time; chunk cross-ref maps output→source location |
|
|
| **Projection model** | **one source → many co-equal derived projections** (new emphasis) |
|
|
|
|
## 5. INTENT mapping
|
|
|
|
### Reinforcements
|
|
|
|
- **Projection (canonical vs derived).** Literate programming is the archetype of
|
|
"canonical source, regenerable derived view" — the same principle as ikiwiki
|
|
compile-to-static (UC-79), but generalized: **two-plus co-equal projections** from one
|
|
source (docs *and* code), not a single output. Strengthens the rule *never confuse a
|
|
projection for the source*.
|
|
- **Transclusion / compose-by-reference.** Named chunks are transclusion (UC-32) and a
|
|
manifest of referenced parts (UC-44) — resolved at derivation time. Confirms
|
|
transclusion=clone=embed=reference as one primitive that also covers *fragment assembly
|
|
of executable content*.
|
|
- **Markdown-first but backend-neutral page model.** noweb/org/Jupytext show the literate
|
|
source can *be* Markdown-ish plain text — so a "literate page" fits the text-first model;
|
|
the *derivations* are the non-text part.
|
|
- **Mechanism over policy.** weave/tangle are mechanisms; *which* projections to
|
|
materialize, when to regenerate, and where outputs go stay configurable.
|
|
|
|
### Divergences / boundaries
|
|
|
|
- **shard-wiki is not a build system.** It should *recognize and present* a
|
|
source-with-projections (attach the source, surface derived views with provenance), not
|
|
re-implement `tangle`/kernels. Materializing a projection may delegate to the source's
|
|
own tool or be capability-gated to "snapshot only."
|
|
- **The interesting projection is derivation, not caching.** shard-wiki's base projection
|
|
is cache-like (lazy copy of remote content, UC-53/57). Weave/tangle is a *different*
|
|
projection species: **transform/derive** one source into rendered forms. The contract
|
|
should model projection as having (at least) two kinds: **replication-projection** and
|
|
**derivation-projection** (compile/weave/evaluate).
|
|
|
|
### What to keep
|
|
|
|
1. **One-source-many-projections** as a first-class page-model + projection pattern
|
|
(generalizes UC-79's single output) — see UC-83.
|
|
2. **Named-chunk transclusion** as the executable-content face of UC-32/UC-44 (assembly by
|
|
reference at derivation time).
|
|
3. **Output→source provenance** (the woven cross-ref index): every derived view must point
|
|
back to the exact source location it came from — never present derived output without
|
|
that link (union-without-erasure for derivations).
|
|
|
|
## 6. UC seed
|
|
|
|
| # | Seed | Disposition |
|
|
|---|------|-------------|
|
|
| UC-83 | Attach a **single-source-multiple-projection** artifact (a literate/woven source): treat the source as canonical, present each **derived projection** (e.g. a documentation view *and* a code view) with **provenance back to the one source**, edits target the source and projections regenerate (delegated to the source's tool or degraded to a static snapshot) | **new** |
|
|
| — | Named chunks `<<name>>` = transclusion / compose-by-reference of (executable) fragments | enrich **UC-32**, **UC-44** |
|
|
| — | Generalize compile-to-static (single output) to **N co-equal projections** from one source | enrich **UC-79** |
|
|
| — | Computed/evaluated projection (org-babel/knitr) = derivation-projection with results | links **UC-54**, foreshadows **T3** |
|
|
|
|
## 7. Architecture notes for SHARD-WP-0002
|
|
|
|
- **T12 (page model):** add **one-source-many-projections** as a page-model shape — a page
|
|
may be a *source* whose presented forms are **derivations** (woven docs, tangled code,
|
|
evaluated results), each carrying output→source provenance. Distinct from prose, typed
|
|
records, query-defined, and inline-embedded objects already logged.
|
|
- **T16 (projection):** split projection into **replication-projection** (lazy cache of
|
|
remote content — current default) vs **derivation-projection** (transform/compile/weave/
|
|
evaluate a source into rendered forms). Derivation-projection is regenerable, may
|
|
delegate to an external tool, and degrades to a captured snapshot when the tool is
|
|
absent (graceful degradation).
|
|
- **Transclusion (T16):** named-chunk-by-name resolution is a transclusion variant where
|
|
the *target is a fragment resolved by name across the source*, assembled at derivation
|
|
time — a concrete shape for UC-32/UC-44 mechanics.
|
|
- **Capability gating:** "can derive projection X" is a capability; a shard that can't run
|
|
the tool still exposes the source + any pre-built/snapshot projections (UC-83 degrade).
|
|
|
|
## 8. Open questions
|
|
|
|
1. Does shard-wiki ever *drive* a derivation (run weave/tangle/evaluate), or only attach
|
|
sources and surface pre-built projections + snapshots? (Same scope question as UC-56
|
|
"do we ever compile-to-static ourselves," now for literate sources.)
|
|
2. Is **UC-83** distinct enough from UC-79 (compile-to-static) to stand alone, or should
|
|
UC-79 be re-read as the single-output special case of UC-83's N-projection general case?
|
|
(Recorded as a possible later consolidation; kept separate now because UC-83's
|
|
projections are *co-equal and semantically different audiences*, not one publish target.)
|
|
3. How is **output→source provenance** represented when a derived line came from a chunk
|
|
accreted across several source locations (the cross-ref is many-to-one)?
|
|
|
|
## 9. Sources
|
|
|
|
- Knuth, *Literate Programming* (1984, *Computer Journal*); the WEB user manual; *TeX: The
|
|
Program* / *MMIX* as canonical WEB exemplars.
|
|
- Ramsey, *noweb* (a simple, extensible literate-programming tool).
|
|
- CWEB (Knuth & Levy); Emacs **Org-mode babel** (tangle + evaluate); **Sweave**/**knitr**
|
|
(R); **Jupytext** (notebook-as-text).
|
|
- prior: `research/260614-ikiwiki-deep-dive/` (compile-to-static, canonical-vs-derived);
|
|
`research/260614-xanadu-deep-dive/` (compose-by-reference / EDL, UC-44).
|
|
|
|
## 10. Traceability
|
|
|
|
New UC **UC-83** carries the marker **⊛** in the federation column of
|
|
`spec/UseCaseCatalog.md` (true lineage = this dive; literate programming is design prior
|
|
art, not a candidate shard, so the marker sits with the projection/compose-by-reference
|
|
family). Enriched: UC-32, UC-44, UC-79; links UC-54. Architecture cross-refs: SHARD-WP-0002
|
|
T12 (one-source-many-projections page shape), T16 (replication- vs derivation-projection;
|
|
named-chunk transclusion).
|