From 8260a665280769aa97725b141325b41de3a5f467 Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 4 May 2026 01:22:45 +0200 Subject: [PATCH] Workplan for dataflow pipeline workflows --- docs/markdown-dataflow-workflow-assessment.md | 145 ++++++++++++++ docs/workplan-planning-map.md | 8 + ...-runtime-context-and-assessment-engines.md | 4 +- ...WP-0006-cache-backend-architecture-core.md | 1 + ...-reference-processor-literate-workflows.md | 1 + ...11-markdown-dataflow-pipeline-workflows.md | 179 ++++++++++++++++++ 6 files changed, 337 insertions(+), 1 deletion(-) create mode 100644 docs/markdown-dataflow-workflow-assessment.md create mode 100644 workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md diff --git a/docs/markdown-dataflow-workflow-assessment.md b/docs/markdown-dataflow-workflow-assessment.md new file mode 100644 index 0000000..865620f --- /dev/null +++ b/docs/markdown-dataflow-workflow-assessment.md @@ -0,0 +1,145 @@ +# Markdown Dataflow Workflow Assessment + +Date: 2026-05-04 + +## Question + +Can `markitect-tool` support workflows that grab data from one or more Markdown +documents, process it deterministically or with optional LLM assistance, and +inject the result into one or more Markdown outputs? + +## Short Answer + +Partially today, but not yet as a clean framework. + +The current implementation provides the right primitives: + +- parse Markdown and frontmatter +- query/extract structured content +- transform documents +- compose files +- include/transclude content +- render deterministic templates +- generate stubs from contracts +- run simple generation plans +- expose a provider-neutral assisted-generation hook + +However, the user still has to orchestrate these steps manually through shell +commands or Python code. There is no declarative pipeline model that says: + +```text +sources -> extracted data products -> deterministic processors -> assisted +processors -> templates/generation -> multiple outputs +``` + +That missing layer is where markitect can become much more practical. + +## Comparison with markitect-main + +`markitect-main` had more separate experiments: + +- template parser and renderer +- data-driven draft generation +- prompt/LLM quality gates +- transclusion with variables and conditionals +- batch/document-processing commands +- infospace and spaces workflows +- cache/reference graph ideas + +The new implementation is better in its core shape: + +- smaller, provider-neutral modules +- deterministic behavior before optional LLM use +- CLI/API parity +- contracts as a stronger rule source than raw schemas +- structured query/extract feeding generation +- explicit safety boundaries for includes +- tests around each primitive + +What we sacrificed: + +- no first-class batch/pipeline runner yet +- no prompt/LLM workflow execution in core +- no variable/conditional transclusion yet +- no data-driven multi-record draft generator yet +- no workflow provenance graph tying inputs to outputs +- no multi-output orchestration +- no built-in object/data shaping between extraction and rendering + +This is a good trade for the foundation, but the pipeline layer needs to exist. + +## Desired Workflow Shape + +A future pipeline plan should be Markdown-native and inspectable: + +```markdown +# Release Note Pipeline + +```yaml workflow +sources: + decisions: + glob: docs/adr/*.md + extract: + accepted: + selector: sections[heading=Decision] + status: + selector: frontmatter.status + +steps: + summarize: + kind: deterministic.template + template: templates/release-summary.md + data: + decisions: ${sources.decisions.accepted} + + assisted_review: + kind: assisted.generation + input: ${steps.summarize.markdown} + prompt: prompts/reviewer.md + optional: true + +outputs: + release_notes: + template: templates/release-notes.md + data: + summary: ${steps.summarize.markdown} + review: ${steps.assisted_review.markdown} + output: out/release-notes.md +``` +``` + +This should remain executable without LLM support. Assisted steps should be +optional, externally supplied, and policy-aware. + +## Architecture Gap + +The missing generalized layer needs: + +- source collectors for Markdown files, globs, directories, and future indexes +- named extracted data products +- a small data expression model for referencing previous results +- deterministic step registry +- optional assisted step registry +- multi-output sinks +- provenance and diagnostics per step +- dry-run/plan/inspect modes +- caching and invalidation hooks +- policy hooks before assisted steps or sensitive output writes + +## Relationship to Existing Workplans + +- `MKTT-WP-0003` gives the primitive surface. +- `MKTT-WP-0010` gives richer references, processors, regions, and chunks. +- `MKTT-WP-0006` gives backend/provenance/cache interfaces. +- `MKTT-WP-0005` gives runtime context and form/assessment engines. +- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that + wires these together. + +## Recommendation + +Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching +and incremental processing for the current primitives. + +Create a new workplan for declarative Markdown dataflow pipelines. It should be +P1/P2: important enough not to forget, but best implemented after the reference +and processor model has at least its first architecture pass. diff --git a/docs/workplan-planning-map.md b/docs/workplan-planning-map.md index 5245300..818293a 100644 --- a/docs/workplan-planning-map.md +++ b/docs/workplan-planning-map.md @@ -35,6 +35,7 @@ and descriptions mirror the operational view. | `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. | | `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. | | `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. | +| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. | | `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. | | `MKTT-WP-0008` | P3 | todo | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory cache after backend and policy floor are available. | @@ -53,6 +54,12 @@ context-memory, and access-control architecture before those become rigid. These are mixed task/workstream dependencies. State Hub does not currently model them natively. +`MKTT-WP-0011` captures the practical workflow layer that wires existing +primitives together: Markdown sources, selectors, deterministic processors, +optional assisted generation hooks, and multiple Markdown outputs. It should not +block P3.7, but it should follow the first reference model and processor +registry decisions in `MKTT-WP-0010`. + ## State Hub Mirror Native State Hub dependency edges should mirror the whole-workstream @@ -69,6 +76,7 @@ dependencies: - `MKTT-WP-0007 -> MKTT-WP-0006` - `MKTT-WP-0005 -> MKTT-WP-0003` - `MKTT-WP-0005 -> MKTT-WP-0004` +- `MKTT-WP-0011 -> MKTT-WP-0003` - `MKTT-WP-0009 -> MKTT-WP-0006` - `MKTT-WP-0008 -> MKTT-WP-0006` - `MKTT-WP-0008 -> MKTT-WP-0007` diff --git a/workplans/MKTT-WP-0005-runtime-context-and-assessment-engines.md b/workplans/MKTT-WP-0005-runtime-context-and-assessment-engines.md index e8ab2ca..f68f358 100644 --- a/workplans/MKTT-WP-0005-runtime-context-and-assessment-engines.md +++ b/workplans/MKTT-WP-0005-runtime-context-and-assessment-engines.md @@ -11,8 +11,10 @@ planning_order: 70 depends_on_workplans: - MKTT-WP-0003 - MKTT-WP-0004 +related_workplans: + - MKTT-WP-0011 created: "2026-05-03" -updated: "2026-05-03" +updated: "2026-05-04" state_hub_workstream_id: "7918687e-2364-46b1-ab7e-65aa77cb8449" --- diff --git a/workplans/MKTT-WP-0006-cache-backend-architecture-core.md b/workplans/MKTT-WP-0006-cache-backend-architecture-core.md index d82578e..d9661a3 100644 --- a/workplans/MKTT-WP-0006-cache-backend-architecture-core.md +++ b/workplans/MKTT-WP-0006-cache-backend-architecture-core.md @@ -14,6 +14,7 @@ depends_on_tasks: - MKTT-WP-0003-T005 related_workplans: - MKTT-WP-0010 + - MKTT-WP-0011 created: "2026-05-03" updated: "2026-05-04" state_hub_workstream_id: "0c585f8a-5c7e-4c89-b785-5b0089180256" diff --git a/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md b/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md index 19d6573..132a30d 100644 --- a/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md +++ b/workplans/MKTT-WP-0010-content-reference-processor-literate-workflows.md @@ -17,6 +17,7 @@ informs_workplans: - MKTT-WP-0007 - MKTT-WP-0008 - MKTT-WP-0009 + - MKTT-WP-0011 created: "2026-05-04" updated: "2026-05-04" state_hub_workstream_id: "7863fd01-0be0-4dbc-9941-0151365bb9e1" diff --git a/workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md b/workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md new file mode 100644 index 0000000..7cf8d96 --- /dev/null +++ b/workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md @@ -0,0 +1,179 @@ +--- +id: MKTT-WP-0011 +type: workplan +title: "Markdown Dataflow Pipeline Workflows" +domain: markitect +status: todo +owner: markitect-tool +topic_slug: markitect +planning_priority: P2 +planning_order: 75 +depends_on_workplans: + - MKTT-WP-0003 +depends_on_tasks: + - MKTT-WP-0010-T001 + - MKTT-WP-0010-T005 +related_workplans: + - MKTT-WP-0005 + - MKTT-WP-0006 + - MKTT-WP-0008 + - MKTT-WP-0009 +created: "2026-05-04" +updated: "2026-05-04" +state_hub_workstream_id: "ed4c491d-4f81-4df0-af51-5f4bd4d1ad91" +--- + +# MKTT-WP-0011: Markdown Dataflow Pipeline Workflows + +## Purpose + +Create a declarative workflow layer for Markdown-to-Markdown dataflow: +collecting data from one or more Markdown sources, applying deterministic and +optional assisted processing, and injecting the results into one or more +Markdown outputs. + +## Background + +The current toolkit has strong primitives: parse, query, extract, transform, +compose, include, template render, contract stub generation, generation plans, +and a provider-neutral assisted-generation hook. + +What is missing is orchestration. Users can script the pieces manually, but +there is not yet a first-class workflow model for: + +```text +Markdown sources -> extracted data products -> processors -> generated outputs +``` + +See `docs/markdown-dataflow-workflow-assessment.md`. + +## P11.1 - Define workflow plan model + +```task +id: MKTT-WP-0011-T001 +status: todo +priority: high +state_hub_task_id: "c335cbaa-dfb9-4df5-b1ae-87aaf6097bd8" +``` + +Define a Markdown/YAML workflow plan format with sources, named data products, +steps, outputs, variables, dry-run behavior, diagnostics, and provenance. + +Output: workflow schema, examples, and validation diagnostics. + +## P11.2 - Implement Markdown source collectors + +```task +id: MKTT-WP-0011-T002 +status: todo +priority: high +state_hub_task_id: "16a89801-d96d-437f-a883-81d09586f47a" +``` + +Collect source data from files, globs, directories, frontmatter paths, +selectors, sections, blocks, metrics, and future reference/index backends. + +Output: source collector API, selector integration, and tests. + +## P11.3 - Implement deterministic step registry + +```task +id: MKTT-WP-0011-T003 +status: todo +priority: high +state_hub_task_id: "808bed93-c7e2-4b34-90f4-f6f961fef503" +``` + +Create step types for query/extract, transform, compose, include, template +render, contract stub generation, contract checks, and data shaping. + +Output: deterministic workflow runner with dependency ordering. + +## P11.4 - Implement data expression and binding model + +```task +id: MKTT-WP-0011-T004 +status: todo +priority: high +state_hub_task_id: "ea1ad9d2-3668-4b65-afb4-f490e5bfd0c6" +``` + +Allow workflow steps and outputs to reference previous results by stable names, +for example `${sources.adrs.decisions}` or `${steps.summary.markdown}`. + +Output: expression resolver, type checks, and missing-reference diagnostics. + +## P11.5 - Add optional assisted processing step boundary + +```task +id: MKTT-WP-0011-T005 +status: todo +priority: medium +state_hub_task_id: "ed1adc60-fdd8-4d4c-b4d7-7ce906e641c6" +``` + +Add assisted step support through the provider-neutral generation hook protocol. +The workflow engine must not require provider dependencies and must support +dry-run, optional steps, and policy gates before sending data to a provider. + +Output: hook adapter interface and tests with fake providers. + +## P11.6 - Implement multi-output sinks + +```task +id: MKTT-WP-0011-T006 +status: todo +priority: high +state_hub_task_id: "902707d7-46fe-45d6-a9ec-b85763065ff9" +``` + +Support writing one or many Markdown outputs from templates, generated content, +or composed results. Outputs must be path-safe, reproducible, and traceable to +their source data. + +Output: output sink API, path-safety checks, and provenance manifests. + +## P11.7 - Add workflow CLI + +```task +id: MKTT-WP-0011-T007 +status: todo +priority: high +state_hub_task_id: "ccc26867-5724-4205-b3fe-a8b9d046775d" +``` + +Add: + +```text +mkt workflow inspect +mkt workflow plan +mkt workflow run +``` + +Include JSON/YAML outputs for agent use. + +## P11.8 - Add representative end-to-end examples + +```task +id: MKTT-WP-0011-T008 +status: todo +priority: high +state_hub_task_id: "f8501ea6-1ead-477d-8f64-c196e7edfe68" +``` + +Create examples covering: + +- multiple ADRs -> release notes +- contract data -> generated documents +- source snippets -> docs +- deterministic summary -> optional assisted review -> final Markdown + +## Exit Criteria + +- A non-programmer can write a Markdown/YAML workflow that extracts data from + Markdown documents and generates new Markdown outputs. +- The same workflow is repeatable for identical inputs. +- Assisted steps are optional and external. +- Diagnostics identify which source, step, or output failed. +- The implementation remains compatible with future references/processors, + cache/provenance, context engines, and access-control policy.