generated from coulomb/repo-seed
Enhanced usecase example weave tangle for later workflows
This commit is contained in:
311
docs/content-reference-literate-workflow-research.md
Normal file
311
docs/content-reference-literate-workflow-research.md
Normal file
@@ -0,0 +1,311 @@
|
||||
# Content References, Processors, and Literate Workflows
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
This note records the follow-up research after the first transform, compose,
|
||||
and include implementation. The goal is to keep `markitect-tool` close to
|
||||
Markdown while preserving the richer ideas that made `markitect-main`
|
||||
interesting: reversible explode/implode, transclusion, processors, namespaces,
|
||||
content references, and Knuth-style weave/tangle workflows.
|
||||
|
||||
## Research Inputs
|
||||
|
||||
- [WEB on CTAN](https://ctan.org/pkg/web) and
|
||||
[Knuth/Levy CWEB](https://cs.stanford.edu/~knuth/cweb.html): literate source
|
||||
is processed in two directions, one for compilable source and one for
|
||||
readable documentation.
|
||||
- [noweb Hacker's Guide](https://www.cs.tufts.edu/~nr/noweb/guide.html):
|
||||
language-independent literate programming benefits from a pipeline
|
||||
representation and named chunks that tools can extend.
|
||||
- [Org Babel](https://orgmode.org/worg/org-contrib/babel/intro.html): source
|
||||
blocks can be executable, parameterized, named, reused, tangled to files, and
|
||||
woven into reproducible documents.
|
||||
- [CommonMark fenced code blocks](https://spec.commonmark.org/0.31.2/#fenced-code-blocks):
|
||||
fenced blocks are first-class Markdown structure and must be handled by the
|
||||
parser, not by naive global text rewrites.
|
||||
- [Asciidoctor include directives](https://docs.asciidoctor.org/asciidoc/latest/directives/include/)
|
||||
and [tagged regions](https://docs.asciidoctor.org/asciidoc/latest/directives/include-tagged-regions/):
|
||||
includes need predictable base-dir resolution, safe-mode boundaries, line and
|
||||
tag selection, and source-code-region reuse.
|
||||
- [Sphinx literalinclude](https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html):
|
||||
code inclusion commonly needs line ranges, object-level extraction,
|
||||
highlighting metadata, dedent, and original line-number handling.
|
||||
- [DITA conref](https://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/archSpec/base/conref.html)
|
||||
and [conkeyref](https://docs.oasis-open.org/dita/dita/v1.3/os/part1-base/langRef/attributes/theconkeyrefattribute.html):
|
||||
content reuse becomes much stronger when references have IDs, keys, scoped
|
||||
indirection, validity checks, and clear attribute merge rules.
|
||||
- [W3C XInclude](https://www.w3.org/TR/xinclude/): inclusion should have an
|
||||
explicit processing model, target addressing, and fallback behavior.
|
||||
- [JSON-LD 1.1 contexts](https://www.w3.org/TR/json-ld/): namespaces can map
|
||||
short terms to stable global identifiers while retaining compact authoring.
|
||||
- [Python C3 MRO](https://www.python.org/download/releases/2.3/mro/) and
|
||||
[CLOS concepts](https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node261.html):
|
||||
multiple inheritance needs deterministic linearization, monotonicity, local
|
||||
precedence, and explicit method/slot combination rules.
|
||||
- [Pandoc filters](https://pandoc.org/filters.html): processors can be cleanly
|
||||
modeled as AST transformations over document nodes and code blocks.
|
||||
|
||||
## Lessons
|
||||
|
||||
Markdown can carry a surprisingly rich system if the extra semantics are placed
|
||||
in stable, inspectable constructs:
|
||||
|
||||
- Frontmatter declares document-level identity, namespaces, and defaults.
|
||||
- Headings and fenced blocks become addressable content units.
|
||||
- Include/transclusion is a resolver over content references, not only file
|
||||
expansion.
|
||||
- Processors operate on typed blocks and produce diagnostics, dependencies,
|
||||
generated content, or files.
|
||||
- Weave/tangle is a special case of named content units plus processor targets.
|
||||
- Explode/implode needs a manifest with source spans and stable IDs so the
|
||||
directory form is not a lossy export.
|
||||
- Multiple inheritance is useful for document templates, regulatory overlays,
|
||||
style/persona overlays, and reusable content classes, but only if merge
|
||||
behavior is deterministic and diagnosable.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Reversible Large-Document Editing
|
||||
|
||||
An author explodes a long PRD/FRS into a directory, edits sections in separate
|
||||
files, then implodes it back into a canonical single Markdown document. The
|
||||
manifest preserves frontmatter policy, heading levels, ordering, source spans,
|
||||
and generated filenames.
|
||||
|
||||
### 2. Knuth-Style Markdown Weave/Tangle
|
||||
|
||||
A document explains a program in the order best for human understanding. Named
|
||||
code chunks are declared in fenced blocks, cross-reference each other, and
|
||||
tangle into one or more source files. The woven output keeps prose, chunk
|
||||
cross-links, and optionally generated indexes.
|
||||
|
||||
### 3. Executable Documentation Pipelines
|
||||
|
||||
Fenced blocks act as processors: shell, Python, SQL, validation, diagram, or
|
||||
custom processors can consume inputs, emit outputs, and record dependencies.
|
||||
Execution is optional and controlled; pure transforms remain deterministic.
|
||||
|
||||
### 4. Reusable Legal, Contract, and Product Clauses
|
||||
|
||||
Common clauses are defined once with stable IDs. Documents include them by
|
||||
namespace/key and can select variants by jurisdiction, customer type, language,
|
||||
or document class. Diagnostics explain missing keys and conflicting variants.
|
||||
|
||||
### 5. Source Snippet Documentation
|
||||
|
||||
Docs include code by tag, line range, parser object, or named block while
|
||||
preserving source line references. This supports API docs, changelog examples,
|
||||
and tutorials that stay aligned with source files.
|
||||
|
||||
### 6. Content Classes and Multiple Inheritance
|
||||
|
||||
A document can be treated as an instance of several content classes: for
|
||||
example `base:prd`, `market:enterprise`, `jurisdiction:eu`, and
|
||||
`style:board-brief`. Slot values, assertions, sections, and snippets resolve in
|
||||
a deterministic order with explicit merge strategies.
|
||||
|
||||
### 7. Agent Context Packages
|
||||
|
||||
An agent can request a namespace, topic, chunk, section, or graph slice and get
|
||||
a bounded context package with provenance, dependencies, hashes, and security
|
||||
labels. This dovetails with later cache and memory work.
|
||||
|
||||
### 8. Security-Sensitive Knowledge Gateways
|
||||
|
||||
References and processor outputs carry labels. Policy can filter or redact
|
||||
content before transclusion, weaving, tangling, or context-package creation.
|
||||
|
||||
## Architecture Blueprint
|
||||
|
||||
### Content Unit Model
|
||||
|
||||
The parser should expose addressable units beyond the current document,
|
||||
section, and block lists:
|
||||
|
||||
- document
|
||||
- frontmatter path
|
||||
- section
|
||||
- block
|
||||
- fenced block
|
||||
- named region
|
||||
- named chunk
|
||||
- processor result
|
||||
|
||||
Each unit should have:
|
||||
|
||||
- stable local ID
|
||||
- optional global name
|
||||
- source path and source span
|
||||
- kind/type
|
||||
- content hash
|
||||
- dependency list
|
||||
- labels/policy metadata
|
||||
|
||||
### Reference Syntax
|
||||
|
||||
Keep Markdown readable and allow several levels of precision:
|
||||
|
||||
```markdown
|
||||
<!-- mkt:include ref="std:clauses/payment" -->
|
||||
<!-- mkt:include path="sections/intro.md" selector="sections[heading=Summary]" -->
|
||||
<!-- mkt:include ref="src:api#tag:create-user" mode="literal" -->
|
||||
```
|
||||
|
||||
Frontmatter can define namespaces:
|
||||
|
||||
```yaml
|
||||
namespaces:
|
||||
std: ./standards/
|
||||
src: ../src/
|
||||
contract: ./contracts/
|
||||
```
|
||||
|
||||
References should resolve through a single resolver API:
|
||||
|
||||
```text
|
||||
namespace + address + selector + mode + context -> resolved content unit(s)
|
||||
```
|
||||
|
||||
### Region and Chunk Syntax
|
||||
|
||||
Use comments for regions so they can live inside Markdown or source files:
|
||||
|
||||
```markdown
|
||||
<!-- mkt:region id="overview" -->
|
||||
Reusable content.
|
||||
<!-- /mkt:region -->
|
||||
```
|
||||
|
||||
Use fenced blocks for executable or tangible chunks:
|
||||
|
||||
````markdown
|
||||
```python {#load-config tangle="src/config.py"}
|
||||
def load_config(path):
|
||||
return {}
|
||||
```
|
||||
````
|
||||
|
||||
Chunk references can stay close to noweb:
|
||||
|
||||
```text
|
||||
<<load-config>>
|
||||
```
|
||||
|
||||
The processor layer decides whether chunk references are expanded during
|
||||
tangle, displayed during weave, or left literal.
|
||||
|
||||
### Processor Registry
|
||||
|
||||
Processors should be pluggable but explicit. A processor receives:
|
||||
|
||||
- unit content and metadata
|
||||
- resolver
|
||||
- execution context
|
||||
- policy context
|
||||
- output target request
|
||||
|
||||
It returns:
|
||||
|
||||
- transformed content, generated files, or computed values
|
||||
- diagnostics
|
||||
- dependency edges
|
||||
- provenance events
|
||||
|
||||
Core processors should start deterministic: include, region, explode/implode,
|
||||
tangle, weave, and simple text/Markdown transforms. Executing arbitrary code is
|
||||
a later, opt-in capability.
|
||||
|
||||
### Explode/Implode
|
||||
|
||||
Explode/implode should become a first-class reversible operation, not a loose
|
||||
directory export. The manifest should include:
|
||||
|
||||
- original path and hash
|
||||
- variant type (`flat`, `hierarchical`, `semantic`)
|
||||
- frontmatter preservation policy
|
||||
- section/chunk/source-span entries
|
||||
- file paths and order
|
||||
- heading-level policy
|
||||
- warnings and non-lossy roundtrip checks
|
||||
|
||||
The old `markitect-main` flat/hierarchical/semantic variants are worth
|
||||
reimplementing behind a small variant interface.
|
||||
|
||||
### Weave/Tangle
|
||||
|
||||
Tangle extracts named chunks to target files, expanding chunk references in a
|
||||
deterministic dependency order. Weave renders human-readable documentation with
|
||||
chunk backlinks and optional source indexes.
|
||||
|
||||
Minimum useful MVP:
|
||||
|
||||
- discover named fenced blocks
|
||||
- support `tangle="<path>"`
|
||||
- concatenate multiple chunks for the same target in document order
|
||||
- expand `<<chunk-id>>` inside code
|
||||
- detect missing/cyclic chunk references
|
||||
- emit source mapping comments optionally
|
||||
|
||||
### Content Class and Multiple Inheritance
|
||||
|
||||
Document classes should be data, not Python inheritance. A class can define:
|
||||
|
||||
- slots
|
||||
- required sections
|
||||
- snippets
|
||||
- assertions
|
||||
- processors
|
||||
- merge policies
|
||||
|
||||
An instance declares:
|
||||
|
||||
```yaml
|
||||
document_class:
|
||||
extends:
|
||||
- contract:prd
|
||||
- market:enterprise
|
||||
- jurisdiction:eu
|
||||
```
|
||||
|
||||
Resolution should use a C3-like linearization. Merge policies must be explicit:
|
||||
|
||||
- `replace`
|
||||
- `append`
|
||||
- `prepend`
|
||||
- `deep_merge`
|
||||
- `before:<slot>`
|
||||
- `after:<slot>`
|
||||
- `error_on_conflict`
|
||||
|
||||
Diagnostics should report inconsistent precedence, ambiguous slot definitions,
|
||||
and merge-policy violations.
|
||||
|
||||
## Comparison with Current Implementation
|
||||
|
||||
What we have now is a good kernel:
|
||||
|
||||
- Parser/frontmatter/sections/blocks
|
||||
- Contracts and deterministic diagnostics
|
||||
- Query/extraction over structured documents
|
||||
- Transform, compose, and include operations
|
||||
- Safe include path boundaries and cycle checks
|
||||
|
||||
What is missing for the richer framework:
|
||||
|
||||
- stable content IDs and namespaces
|
||||
- region/tag selectors
|
||||
- fenced-block-aware transforms
|
||||
- operation provenance and dependency graphs
|
||||
- structured include diagnostics instead of fail-fast exceptions only
|
||||
- reversible explode/implode with manifests
|
||||
- processor registry
|
||||
- named chunks and weave/tangle
|
||||
- class/object composition with deterministic multi-inheritance
|
||||
- line/source maps across generated outputs
|
||||
- security labels and policy hooks on resolved units
|
||||
|
||||
The clean path is to keep current ops as the small deterministic surface and
|
||||
grow this richer system as a framework layer. That protects simple CLI use while
|
||||
opening a strong route to sophisticated knowledge/programming pipelines.
|
||||
@@ -1,6 +1,6 @@
|
||||
# Workplan Planning Map
|
||||
|
||||
Date: 2026-05-03
|
||||
Date: 2026-05-04
|
||||
|
||||
## Purpose
|
||||
|
||||
@@ -30,8 +30,9 @@ and descriptions mirror the operational view.
|
||||
| `MKTT-WP-0001` | complete | done | none | Repository foundation is complete. |
|
||||
| `MKTT-WP-0002` | complete | done | `MKTT-WP-0001` | Legacy scope extraction is complete. |
|
||||
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
|
||||
| `MKTT-WP-0003` | P0 | active | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Mainline implementation. Continue with P3.5 transform/compose/include. |
|
||||
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Start after transform/composition shape is clear and before serious cache work. |
|
||||
| `MKTT-WP-0003` | P0 | active | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Mainline implementation. P3.5 is complete; continue with P3.6 templating/generation hooks. |
|
||||
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. |
|
||||
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Preserve richer content-reference, processor, explode/implode, and weave/tangle architecture after P3.6. |
|
||||
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
||||
@@ -39,13 +40,19 @@ and descriptions mirror the operational view.
|
||||
|
||||
## Dependency Notes
|
||||
|
||||
The most important nuance is `MKTT-WP-0006`: it should not wait for every task
|
||||
The first important nuance is `MKTT-WP-0006`: it should not wait for every task
|
||||
in `MKTT-WP-0003`, because it should shape cache architecture before `P3.7`.
|
||||
It should wait until `MKTT-WP-0003-T005` gives transform/composition enough
|
||||
shape to know what cached identities and invalidation rules must preserve.
|
||||
|
||||
This is a mixed task/workstream dependency. State Hub does not currently model
|
||||
that natively.
|
||||
The second important nuance is `MKTT-WP-0010`: it captures richer content
|
||||
reference, processor, explode/implode, and weave/tangle work. It should wait
|
||||
until `MKTT-WP-0003-T006` defines the deterministic templating/generation hook
|
||||
surface, but it should inform backend, index, context-memory, and access-control
|
||||
architecture before those become rigid.
|
||||
|
||||
These are mixed task/workstream dependencies. State Hub does not currently model
|
||||
them natively.
|
||||
|
||||
## State Hub Mirror
|
||||
|
||||
@@ -59,6 +66,7 @@ dependencies:
|
||||
- `MKTT-WP-0003 -> MKTT-WP-0002`
|
||||
- `MKTT-WP-0003 -> MKTT-WP-0004`
|
||||
- `MKTT-WP-0006 -> MKTT-WP-0004`
|
||||
- `MKTT-WP-0010 -> MKTT-WP-0004`
|
||||
- `MKTT-WP-0007 -> MKTT-WP-0006`
|
||||
- `MKTT-WP-0005 -> MKTT-WP-0003`
|
||||
- `MKTT-WP-0005 -> MKTT-WP-0004`
|
||||
|
||||
Reference in New Issue
Block a user