diff --git a/docs/content-reference-literate-workflow-research.md b/docs/content-reference-literate-workflow-research.md new file mode 100644 index 0000000..1a36e03 --- /dev/null +++ b/docs/content-reference-literate-workflow-research.md @@ -0,0 +1,311 @@ +# Content References, Processors, and Literate Workflows + +Date: 2026-05-04 + +## Purpose + +This note records the follow-up research after the first transform, compose, +and include implementation. The goal is to keep `markitect-tool` close to +Markdown while preserving the richer ideas that made `markitect-main` +interesting: reversible explode/implode, transclusion, processors, namespaces, +content references, and Knuth-style weave/tangle workflows. + +## Research Inputs + +- [WEB on CTAN](https://ctan.org/pkg/web) and + [Knuth/Levy CWEB](https://cs.stanford.edu/~knuth/cweb.html): literate source + is processed in two directions, one for compilable source and one for + readable documentation. +- [noweb Hacker's Guide](https://www.cs.tufts.edu/~nr/noweb/guide.html): + language-independent literate programming benefits from a pipeline + representation and named chunks that tools can extend. +- [Org Babel](https://orgmode.org/worg/org-contrib/babel/intro.html): source + blocks can be executable, parameterized, named, reused, tangled to files, and + woven into reproducible documents. +- [CommonMark fenced code blocks](https://spec.commonmark.org/0.31.2/#fenced-code-blocks): + fenced blocks are first-class Markdown structure and must be handled by the + parser, not by naive global text rewrites. +- [Asciidoctor include directives](https://docs.asciidoctor.org/asciidoc/latest/directives/include/) + and [tagged regions](https://docs.asciidoctor.org/asciidoc/latest/directives/include-tagged-regions/): + includes need predictable base-dir resolution, safe-mode boundaries, line and + tag selection, and source-code-region reuse. +- [Sphinx literalinclude](https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html): + code inclusion commonly needs line ranges, object-level extraction, + highlighting metadata, dedent, and original line-number handling. +- [DITA conref](https://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/archSpec/base/conref.html) + and [conkeyref](https://docs.oasis-open.org/dita/dita/v1.3/os/part1-base/langRef/attributes/theconkeyrefattribute.html): + content reuse becomes much stronger when references have IDs, keys, scoped + indirection, validity checks, and clear attribute merge rules. +- [W3C XInclude](https://www.w3.org/TR/xinclude/): inclusion should have an + explicit processing model, target addressing, and fallback behavior. +- [JSON-LD 1.1 contexts](https://www.w3.org/TR/json-ld/): namespaces can map + short terms to stable global identifiers while retaining compact authoring. +- [Python C3 MRO](https://www.python.org/download/releases/2.3/mro/) and + [CLOS concepts](https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node261.html): + multiple inheritance needs deterministic linearization, monotonicity, local + precedence, and explicit method/slot combination rules. +- [Pandoc filters](https://pandoc.org/filters.html): processors can be cleanly + modeled as AST transformations over document nodes and code blocks. + +## Lessons + +Markdown can carry a surprisingly rich system if the extra semantics are placed +in stable, inspectable constructs: + +- Frontmatter declares document-level identity, namespaces, and defaults. +- Headings and fenced blocks become addressable content units. +- Include/transclusion is a resolver over content references, not only file + expansion. +- Processors operate on typed blocks and produce diagnostics, dependencies, + generated content, or files. +- Weave/tangle is a special case of named content units plus processor targets. +- Explode/implode needs a manifest with source spans and stable IDs so the + directory form is not a lossy export. +- Multiple inheritance is useful for document templates, regulatory overlays, + style/persona overlays, and reusable content classes, but only if merge + behavior is deterministic and diagnosable. + +## Use Cases + +### 1. Reversible Large-Document Editing + +An author explodes a long PRD/FRS into a directory, edits sections in separate +files, then implodes it back into a canonical single Markdown document. The +manifest preserves frontmatter policy, heading levels, ordering, source spans, +and generated filenames. + +### 2. Knuth-Style Markdown Weave/Tangle + +A document explains a program in the order best for human understanding. Named +code chunks are declared in fenced blocks, cross-reference each other, and +tangle into one or more source files. The woven output keeps prose, chunk +cross-links, and optionally generated indexes. + +### 3. Executable Documentation Pipelines + +Fenced blocks act as processors: shell, Python, SQL, validation, diagram, or +custom processors can consume inputs, emit outputs, and record dependencies. +Execution is optional and controlled; pure transforms remain deterministic. + +### 4. Reusable Legal, Contract, and Product Clauses + +Common clauses are defined once with stable IDs. Documents include them by +namespace/key and can select variants by jurisdiction, customer type, language, +or document class. Diagnostics explain missing keys and conflicting variants. + +### 5. Source Snippet Documentation + +Docs include code by tag, line range, parser object, or named block while +preserving source line references. This supports API docs, changelog examples, +and tutorials that stay aligned with source files. + +### 6. Content Classes and Multiple Inheritance + +A document can be treated as an instance of several content classes: for +example `base:prd`, `market:enterprise`, `jurisdiction:eu`, and +`style:board-brief`. Slot values, assertions, sections, and snippets resolve in +a deterministic order with explicit merge strategies. + +### 7. Agent Context Packages + +An agent can request a namespace, topic, chunk, section, or graph slice and get +a bounded context package with provenance, dependencies, hashes, and security +labels. This dovetails with later cache and memory work. + +### 8. Security-Sensitive Knowledge Gateways + +References and processor outputs carry labels. Policy can filter or redact +content before transclusion, weaving, tangling, or context-package creation. + +## Architecture Blueprint + +### Content Unit Model + +The parser should expose addressable units beyond the current document, +section, and block lists: + +- document +- frontmatter path +- section +- block +- fenced block +- named region +- named chunk +- processor result + +Each unit should have: + +- stable local ID +- optional global name +- source path and source span +- kind/type +- content hash +- dependency list +- labels/policy metadata + +### Reference Syntax + +Keep Markdown readable and allow several levels of precision: + +```markdown + + + +``` + +Frontmatter can define namespaces: + +```yaml +namespaces: + std: ./standards/ + src: ../src/ + contract: ./contracts/ +``` + +References should resolve through a single resolver API: + +```text +namespace + address + selector + mode + context -> resolved content unit(s) +``` + +### Region and Chunk Syntax + +Use comments for regions so they can live inside Markdown or source files: + +```markdown + +Reusable content. + +``` + +Use fenced blocks for executable or tangible chunks: + +````markdown +```python {#load-config tangle="src/config.py"} +def load_config(path): + return {} +``` +```` + +Chunk references can stay close to noweb: + +```text +<> +``` + +The processor layer decides whether chunk references are expanded during +tangle, displayed during weave, or left literal. + +### Processor Registry + +Processors should be pluggable but explicit. A processor receives: + +- unit content and metadata +- resolver +- execution context +- policy context +- output target request + +It returns: + +- transformed content, generated files, or computed values +- diagnostics +- dependency edges +- provenance events + +Core processors should start deterministic: include, region, explode/implode, +tangle, weave, and simple text/Markdown transforms. Executing arbitrary code is +a later, opt-in capability. + +### Explode/Implode + +Explode/implode should become a first-class reversible operation, not a loose +directory export. The manifest should include: + +- original path and hash +- variant type (`flat`, `hierarchical`, `semantic`) +- frontmatter preservation policy +- section/chunk/source-span entries +- file paths and order +- heading-level policy +- warnings and non-lossy roundtrip checks + +The old `markitect-main` flat/hierarchical/semantic variants are worth +reimplementing behind a small variant interface. + +### Weave/Tangle + +Tangle extracts named chunks to target files, expanding chunk references in a +deterministic dependency order. Weave renders human-readable documentation with +chunk backlinks and optional source indexes. + +Minimum useful MVP: + +- discover named fenced blocks +- support `tangle=""` +- concatenate multiple chunks for the same target in document order +- expand `<>` inside code +- detect missing/cyclic chunk references +- emit source mapping comments optionally + +### Content Class and Multiple Inheritance + +Document classes should be data, not Python inheritance. A class can define: + +- slots +- required sections +- snippets +- assertions +- processors +- merge policies + +An instance declares: + +```yaml +document_class: + extends: + - contract:prd + - market:enterprise + - jurisdiction:eu +``` + +Resolution should use a C3-like linearization. Merge policies must be explicit: + +- `replace` +- `append` +- `prepend` +- `deep_merge` +- `before:` +- `after:` +- `error_on_conflict` + +Diagnostics should report inconsistent precedence, ambiguous slot definitions, +and merge-policy violations. + +## Comparison with Current Implementation + +What we have now is a good kernel: + +- Parser/frontmatter/sections/blocks +- Contracts and deterministic diagnostics +- Query/extraction over structured documents +- Transform, compose, and include operations +- Safe include path boundaries and cycle checks + +What is missing for the richer framework: + +- stable content IDs and namespaces +- region/tag selectors +- fenced-block-aware transforms +- operation provenance and dependency graphs +- structured include diagnostics instead of fail-fast exceptions only +- reversible explode/implode with manifests +- processor registry +- named chunks and weave/tangle +- class/object composition with deterministic multi-inheritance +- line/source maps across generated outputs +- security labels and policy hooks on resolved units + +The clean path is to keep current ops as the small deterministic surface and +grow this richer system as a framework layer. That protects simple CLI use while +opening a strong route to sophisticated knowledge/programming pipelines. diff --git a/docs/workplan-planning-map.md b/docs/workplan-planning-map.md index 7c200ea..32c36f5 100644 --- a/docs/workplan-planning-map.md +++ b/docs/workplan-planning-map.md @@ -1,6 +1,6 @@ # Workplan Planning Map -Date: 2026-05-03 +Date: 2026-05-04 ## Purpose @@ -30,8 +30,9 @@ and descriptions mirror the operational view. | `MKTT-WP-0001` | complete | done | none | Repository foundation is complete. | | `MKTT-WP-0002` | complete | done | `MKTT-WP-0001` | Legacy scope extraction is complete. | | `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. | -| `MKTT-WP-0003` | P0 | active | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Mainline implementation. Continue with P3.5 transform/compose/include. | -| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Start after transform/composition shape is clear and before serious cache work. | +| `MKTT-WP-0003` | P0 | active | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Mainline implementation. P3.5 is complete; continue with P3.6 templating/generation hooks. | +| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Ready after transform/composition shape is clear; should account for future reference/provenance needs. | +| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Preserve richer content-reference, processor, explode/implode, and weave/tangle architecture after P3.6. | | `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. | | `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. | | `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. | @@ -39,13 +40,19 @@ and descriptions mirror the operational view. ## Dependency Notes -The most important nuance is `MKTT-WP-0006`: it should not wait for every task +The first important nuance is `MKTT-WP-0006`: it should not wait for every task in `MKTT-WP-0003`, because it should shape cache architecture before `P3.7`. It should wait until `MKTT-WP-0003-T005` gives transform/composition enough shape to know what cached identities and invalidation rules must preserve. -This is a mixed task/workstream dependency. State Hub does not currently model -that natively. +The second important nuance is `MKTT-WP-0010`: it captures richer content +reference, processor, explode/implode, and weave/tangle work. It should wait +until `MKTT-WP-0003-T006` defines the deterministic templating/generation hook +surface, but it should inform backend, index, context-memory, and access-control +architecture before those become rigid. + +These are mixed task/workstream dependencies. State Hub does not currently model +them natively. ## State Hub Mirror @@ -59,6 +66,7 @@ dependencies: - `MKTT-WP-0003 -> MKTT-WP-0002` - `MKTT-WP-0003 -> MKTT-WP-0004` - `MKTT-WP-0006 -> MKTT-WP-0004` +- `MKTT-WP-0010 -> MKTT-WP-0004` - `MKTT-WP-0007 -> MKTT-WP-0006` - `MKTT-WP-0005 -> MKTT-WP-0003` - `MKTT-WP-0005 -> MKTT-WP-0004` diff --git a/workplans/MKTT-WP-0003-core-toolkit-implementation.md b/workplans/MKTT-WP-0003-core-toolkit-implementation.md index 22de0bd..36b6234 100644 --- a/workplans/MKTT-WP-0003-core-toolkit-implementation.md +++ b/workplans/MKTT-WP-0003-core-toolkit-implementation.md @@ -116,6 +116,10 @@ LLM-assisted hooks supplied by external providers. Extension point: `EP-MKTT-001`. +Keep this slice focused on deterministic templates and generation hooks. Rich +processors, named chunks, weave/tangle, namespaces, and content-class +inheritance are captured in `MKTT-WP-0010` after this hook surface is clear. + ## P3.7 - Add caching and incremental processing ```task diff --git a/workplans/MKTT-WP-0006-cache-backend-architecture-core.md b/workplans/MKTT-WP-0006-cache-backend-architecture-core.md index 48b3f69..d82578e 100644 --- a/workplans/MKTT-WP-0006-cache-backend-architecture-core.md +++ b/workplans/MKTT-WP-0006-cache-backend-architecture-core.md @@ -12,8 +12,10 @@ depends_on_workplans: - MKTT-WP-0004 depends_on_tasks: - MKTT-WP-0003-T005 +related_workplans: + - MKTT-WP-0010 created: "2026-05-03" -updated: "2026-05-03" +updated: "2026-05-04" state_hub_workstream_id: "0c585f8a-5c7e-4c89-b785-5b0089180256" --- @@ -31,6 +33,7 @@ Research and architecture are captured in: - `docs/research-lab-cache-backend-research.md` - `docs/cache-backend-architecture-blueprint.md` +- `docs/content-reference-literate-workflow-research.md` ## Decision @@ -38,6 +41,11 @@ Do not start this before the current deterministic transform/composition slice has enough shape to show what cache invalidation must preserve. Start it before WP-0003 P3.7 caching becomes implementation work. +Keep the backend fabric open for `MKTT-WP-0010` content-unit identities, +reference graphs, processor provenance, and weave/tangle source maps. Those +features do not need to be implemented here, but the capability model should not +make them awkward later. + ## P6.1 - Define backend capability model ```task @@ -70,6 +78,9 @@ state_hub_task_id: "5debc135-908a-47ed-ba15-564610970e38" Specify content-addressed document snapshots keyed by source content hash, parser version, parse options, and contract version where relevant. +Include a placeholder for stable content-unit identities and dependency edges so +references/chunks can be cached and invalidated later. + ## P6.3 - Define backend interfaces ```task @@ -82,6 +93,9 @@ state_hub_task_id: "a3e37112-1197-4f6f-8de8-7b3067ef060e" Add protocol classes for snapshot backends, index backends, query adapters, context package registries, and access policy gateways. +Leave room for processor-result stores, reference graph adapters, and source-map +or provenance adapters. + ## P6.4 - Implement local backend registry ```task diff --git a/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md b/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md index da46783..9bb7e78 100644 --- a/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md +++ b/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md @@ -10,8 +10,10 @@ planning_priority: P2 planning_order: 60 depends_on_workplans: - MKTT-WP-0006 +related_workplans: + - MKTT-WP-0010 created: "2026-05-03" -updated: "2026-05-03" +updated: "2026-05-04" state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6" --- @@ -22,6 +24,10 @@ state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6" Implement the first practical backend use case: cached AST introspection, JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents. +This backend should later be able to index `MKTT-WP-0010` references, named +regions, chunks, and processor provenance without changing its basic storage +contract. + ## P7.1 - Implement local snapshot store ```task @@ -77,6 +83,9 @@ state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea" Persist source files, content hashes, frontmatter, headings, sections, blocks, and metrics in SQLite. +Keep schema extension points for reference edges, named regions, chunks, and +processor outputs. + ## P7.5 - Add FTS5 section/block search ```task @@ -100,6 +109,8 @@ state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf" Refresh only changed files based on content hash and parser version. +Include dependency invalidation hooks for future transclusion/reference graphs. + ## P7.7 - Add local index CLI ```task diff --git a/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md b/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md new file mode 100644 index 0000000..19d6573 --- /dev/null +++ b/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md @@ -0,0 +1,170 @@ +--- +id: MKTT-WP-0010 +type: workplan +title: "Content References, Processors, and Literate Workflows" +domain: markitect +status: todo +owner: markitect-tool +topic_slug: markitect +planning_priority: P1 +planning_order: 55 +depends_on_workplans: + - MKTT-WP-0004 +depends_on_tasks: + - MKTT-WP-0003-T006 +informs_workplans: + - MKTT-WP-0006 + - MKTT-WP-0007 + - MKTT-WP-0008 + - MKTT-WP-0009 +created: "2026-05-04" +updated: "2026-05-04" +state_hub_workstream_id: "7863fd01-0be0-4dbc-9941-0151365bb9e1" +--- + +# MKTT-WP-0010: Content References, Processors, and Literate Workflows + +## Purpose + +Build the richer framework layer for namespaced content references, +processor-driven Markdown blocks, reversible explode/implode, and literate +weave/tangle workflows. + +This preserves the important `markitect-main` ideas that were intentionally not +pulled into the first transform/compose/include slice. + +## Background + +The first P3.5 implementation provides a clean deterministic kernel: +frontmatter/body/heading transforms, file composition, and explicit include +markers. That is better for the core toolkit than the old platform-heavy +implementation, but it leaves out several valuable capabilities: + +- reversible explode/implode variants +- named chunks and Knuth-style weave/tangle +- scoped variables and conditional transclusion +- stable namespaces and content references +- processor semantics for fenced blocks +- reference graphs, provenance, and source maps +- deterministic multi-inheritance for content classes and overlays + +See `docs/content-reference-literate-workflow-research.md`. + +## P10.1 - Define reference address model and namespace rules + +```task +id: MKTT-WP-0010-T001 +status: todo +priority: high +state_hub_task_id: "f70d2b9d-151b-46c6-9613-bd6bdbf164e7" +``` + +Define stable content-unit identities, namespace mappings, reference syntax, +resolver inputs/outputs, and error cases. + +Output: reference model docs, examples, and tests for path, namespace, selector, +and ID resolution. + +## P10.2 - Add token-safe transforms and operation provenance + +```task +id: MKTT-WP-0010-T002 +status: todo +priority: high +state_hub_task_id: "e35639b7-756f-4993-8b3c-2e58b23e0eca" +``` + +Make transform/include behavior aware of fenced blocks and parser tokens. Add +structured operation provenance, dependency edges, source spans, and diagnostics. + +Output: token-safe transform implementation and provenance result envelope. + +## P10.3 - Implement named regions and addressable block selectors + +```task +id: MKTT-WP-0010-T003 +status: todo +priority: high +state_hub_task_id: "98cafe28-a364-48f1-ae55-cb47c71d9441" +``` + +Support named Markdown/source regions, section IDs, fenced block IDs, and region +selection by ID/tag/line range where appropriate. + +Output: region parser/resolver, CLI examples, and source-snippet tests. + +## P10.4 - Reimplement reversible explode/implode variants + +```task +id: MKTT-WP-0010-T004 +status: todo +priority: high +state_hub_task_id: "67f77aa1-a7ee-485c-891e-6ae7ecc52067" +``` + +Recreate the useful `markitect-main` explode/implode functionality with a slim +variant interface and manifest-first reversibility. + +Initial variants: flat and hierarchical. Semantic variant can follow once the +reference and processor model is stable. + +Output: `mkt explode`, `mkt implode`, manifest schema, roundtrip tests. + +## P10.5 - Define processor registry for fenced blocks + +```task +id: MKTT-WP-0010-T005 +status: todo +priority: high +state_hub_task_id: "eb7cde08-8a73-4163-ac54-19a2bc7b5f88" +``` + +Create a deterministic processor API for fenced blocks and directives. +Processors should receive content units, resolver access, context, and policy, +and return generated content/files, diagnostics, dependencies, and provenance. + +Output: processor registry API, deterministic built-in processors, and tests. + +## P10.6 - Implement literate weave/tangle MVP + +```task +id: MKTT-WP-0010-T006 +status: todo +priority: high +state_hub_task_id: "090fcc38-758b-4414-b941-40f217eb17ca" +``` + +Implement Markdown-native literate workflows: named code chunks, chunk +references, target files, tangling, and woven documentation with chunk +cross-references. + +Output: `mkt tangle`, `mkt weave`, chunk-reference diagnostics, examples. + +## P10.7 - Design content class composition and multi-inheritance + +```task +id: MKTT-WP-0010-T007 +status: todo +priority: medium +state_hub_task_id: "220e6b27-2d7b-4c22-b5e8-304198ecfea8" +``` + +Define content classes, slots, merge policies, and deterministic +multi-inheritance resolution. Use a C3-like linearization with clear conflict +diagnostics. + +Output: architecture note, examples, and a small deterministic resolver spike. + +## P10.8 - Add migration examples from markitect-main + +```task +id: MKTT-WP-0010-T008 +status: todo +priority: high +state_hub_task_id: "287637d3-1997-43b2-b97d-10587d565cec" +``` + +Translate the relevant old explode/implode, transclusion, and spaces reference +graph tests into successor-style fixtures and examples. + +Output: migration test inventory, example documents, and parity notes.