generated from coulomb/repo-seed
Workplan for dataflow pipeline workflows
This commit is contained in:
145
docs/markdown-dataflow-workflow-assessment.md
Normal file
145
docs/markdown-dataflow-workflow-assessment.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Markdown Dataflow Workflow Assessment
|
||||
|
||||
Date: 2026-05-04
|
||||
|
||||
## Question
|
||||
|
||||
Can `markitect-tool` support workflows that grab data from one or more Markdown
|
||||
documents, process it deterministically or with optional LLM assistance, and
|
||||
inject the result into one or more Markdown outputs?
|
||||
|
||||
## Short Answer
|
||||
|
||||
Partially today, but not yet as a clean framework.
|
||||
|
||||
The current implementation provides the right primitives:
|
||||
|
||||
- parse Markdown and frontmatter
|
||||
- query/extract structured content
|
||||
- transform documents
|
||||
- compose files
|
||||
- include/transclude content
|
||||
- render deterministic templates
|
||||
- generate stubs from contracts
|
||||
- run simple generation plans
|
||||
- expose a provider-neutral assisted-generation hook
|
||||
|
||||
However, the user still has to orchestrate these steps manually through shell
|
||||
commands or Python code. There is no declarative pipeline model that says:
|
||||
|
||||
```text
|
||||
sources -> extracted data products -> deterministic processors -> assisted
|
||||
processors -> templates/generation -> multiple outputs
|
||||
```
|
||||
|
||||
That missing layer is where markitect can become much more practical.
|
||||
|
||||
## Comparison with markitect-main
|
||||
|
||||
`markitect-main` had more separate experiments:
|
||||
|
||||
- template parser and renderer
|
||||
- data-driven draft generation
|
||||
- prompt/LLM quality gates
|
||||
- transclusion with variables and conditionals
|
||||
- batch/document-processing commands
|
||||
- infospace and spaces workflows
|
||||
- cache/reference graph ideas
|
||||
|
||||
The new implementation is better in its core shape:
|
||||
|
||||
- smaller, provider-neutral modules
|
||||
- deterministic behavior before optional LLM use
|
||||
- CLI/API parity
|
||||
- contracts as a stronger rule source than raw schemas
|
||||
- structured query/extract feeding generation
|
||||
- explicit safety boundaries for includes
|
||||
- tests around each primitive
|
||||
|
||||
What we sacrificed:
|
||||
|
||||
- no first-class batch/pipeline runner yet
|
||||
- no prompt/LLM workflow execution in core
|
||||
- no variable/conditional transclusion yet
|
||||
- no data-driven multi-record draft generator yet
|
||||
- no workflow provenance graph tying inputs to outputs
|
||||
- no multi-output orchestration
|
||||
- no built-in object/data shaping between extraction and rendering
|
||||
|
||||
This is a good trade for the foundation, but the pipeline layer needs to exist.
|
||||
|
||||
## Desired Workflow Shape
|
||||
|
||||
A future pipeline plan should be Markdown-native and inspectable:
|
||||
|
||||
```markdown
|
||||
# Release Note Pipeline
|
||||
|
||||
```yaml workflow
|
||||
sources:
|
||||
decisions:
|
||||
glob: docs/adr/*.md
|
||||
extract:
|
||||
accepted:
|
||||
selector: sections[heading=Decision]
|
||||
status:
|
||||
selector: frontmatter.status
|
||||
|
||||
steps:
|
||||
summarize:
|
||||
kind: deterministic.template
|
||||
template: templates/release-summary.md
|
||||
data:
|
||||
decisions: ${sources.decisions.accepted}
|
||||
|
||||
assisted_review:
|
||||
kind: assisted.generation
|
||||
input: ${steps.summarize.markdown}
|
||||
prompt: prompts/reviewer.md
|
||||
optional: true
|
||||
|
||||
outputs:
|
||||
release_notes:
|
||||
template: templates/release-notes.md
|
||||
data:
|
||||
summary: ${steps.summarize.markdown}
|
||||
review: ${steps.assisted_review.markdown}
|
||||
output: out/release-notes.md
|
||||
```
|
||||
```
|
||||
|
||||
This should remain executable without LLM support. Assisted steps should be
|
||||
optional, externally supplied, and policy-aware.
|
||||
|
||||
## Architecture Gap
|
||||
|
||||
The missing generalized layer needs:
|
||||
|
||||
- source collectors for Markdown files, globs, directories, and future indexes
|
||||
- named extracted data products
|
||||
- a small data expression model for referencing previous results
|
||||
- deterministic step registry
|
||||
- optional assisted step registry
|
||||
- multi-output sinks
|
||||
- provenance and diagnostics per step
|
||||
- dry-run/plan/inspect modes
|
||||
- caching and invalidation hooks
|
||||
- policy hooks before assisted steps or sensitive output writes
|
||||
|
||||
## Relationship to Existing Workplans
|
||||
|
||||
- `MKTT-WP-0003` gives the primitive surface.
|
||||
- `MKTT-WP-0010` gives richer references, processors, regions, and chunks.
|
||||
- `MKTT-WP-0006` gives backend/provenance/cache interfaces.
|
||||
- `MKTT-WP-0005` gives runtime context and form/assessment engines.
|
||||
- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that
|
||||
wires these together.
|
||||
|
||||
## Recommendation
|
||||
|
||||
Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching
|
||||
and incremental processing for the current primitives.
|
||||
|
||||
Create a new workplan for declarative Markdown dataflow pipelines. It should be
|
||||
P1/P2: important enough not to forget, but best implemented after the reference
|
||||
and processor model has at least its first architecture pass.
|
||||
@@ -35,6 +35,7 @@ and descriptions mirror the operational view.
|
||||
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
|
||||
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
||||
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
||||
| `MKTT-WP-0008` | P3 | todo | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory cache after backend and policy floor are available. |
|
||||
|
||||
@@ -53,6 +54,12 @@ context-memory, and access-control architecture before those become rigid.
|
||||
These are mixed task/workstream dependencies. State Hub does not currently model
|
||||
them natively.
|
||||
|
||||
`MKTT-WP-0011` captures the practical workflow layer that wires existing
|
||||
primitives together: Markdown sources, selectors, deterministic processors,
|
||||
optional assisted generation hooks, and multiple Markdown outputs. It should not
|
||||
block P3.7, but it should follow the first reference model and processor
|
||||
registry decisions in `MKTT-WP-0010`.
|
||||
|
||||
## State Hub Mirror
|
||||
|
||||
Native State Hub dependency edges should mirror the whole-workstream
|
||||
@@ -69,6 +76,7 @@ dependencies:
|
||||
- `MKTT-WP-0007 -> MKTT-WP-0006`
|
||||
- `MKTT-WP-0005 -> MKTT-WP-0003`
|
||||
- `MKTT-WP-0005 -> MKTT-WP-0004`
|
||||
- `MKTT-WP-0011 -> MKTT-WP-0003`
|
||||
- `MKTT-WP-0009 -> MKTT-WP-0006`
|
||||
- `MKTT-WP-0008 -> MKTT-WP-0006`
|
||||
- `MKTT-WP-0008 -> MKTT-WP-0007`
|
||||
|
||||
@@ -11,8 +11,10 @@ planning_order: 70
|
||||
depends_on_workplans:
|
||||
- MKTT-WP-0003
|
||||
- MKTT-WP-0004
|
||||
related_workplans:
|
||||
- MKTT-WP-0011
|
||||
created: "2026-05-03"
|
||||
updated: "2026-05-03"
|
||||
updated: "2026-05-04"
|
||||
state_hub_workstream_id: "7918687e-2364-46b1-ab7e-65aa77cb8449"
|
||||
---
|
||||
|
||||
|
||||
@@ -14,6 +14,7 @@ depends_on_tasks:
|
||||
- MKTT-WP-0003-T005
|
||||
related_workplans:
|
||||
- MKTT-WP-0010
|
||||
- MKTT-WP-0011
|
||||
created: "2026-05-03"
|
||||
updated: "2026-05-04"
|
||||
state_hub_workstream_id: "0c585f8a-5c7e-4c89-b785-5b0089180256"
|
||||
|
||||
@@ -17,6 +17,7 @@ informs_workplans:
|
||||
- MKTT-WP-0007
|
||||
- MKTT-WP-0008
|
||||
- MKTT-WP-0009
|
||||
- MKTT-WP-0011
|
||||
created: "2026-05-04"
|
||||
updated: "2026-05-04"
|
||||
state_hub_workstream_id: "7863fd01-0be0-4dbc-9941-0151365bb9e1"
|
||||
|
||||
179
workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md
Normal file
179
workplans/MKTT-WP-0011-markdown-dataflow-pipeline-workflows.md
Normal file
@@ -0,0 +1,179 @@
|
||||
---
|
||||
id: MKTT-WP-0011
|
||||
type: workplan
|
||||
title: "Markdown Dataflow Pipeline Workflows"
|
||||
domain: markitect
|
||||
status: todo
|
||||
owner: markitect-tool
|
||||
topic_slug: markitect
|
||||
planning_priority: P2
|
||||
planning_order: 75
|
||||
depends_on_workplans:
|
||||
- MKTT-WP-0003
|
||||
depends_on_tasks:
|
||||
- MKTT-WP-0010-T001
|
||||
- MKTT-WP-0010-T005
|
||||
related_workplans:
|
||||
- MKTT-WP-0005
|
||||
- MKTT-WP-0006
|
||||
- MKTT-WP-0008
|
||||
- MKTT-WP-0009
|
||||
created: "2026-05-04"
|
||||
updated: "2026-05-04"
|
||||
state_hub_workstream_id: "ed4c491d-4f81-4df0-af51-5f4bd4d1ad91"
|
||||
---
|
||||
|
||||
# MKTT-WP-0011: Markdown Dataflow Pipeline Workflows
|
||||
|
||||
## Purpose
|
||||
|
||||
Create a declarative workflow layer for Markdown-to-Markdown dataflow:
|
||||
collecting data from one or more Markdown sources, applying deterministic and
|
||||
optional assisted processing, and injecting the results into one or more
|
||||
Markdown outputs.
|
||||
|
||||
## Background
|
||||
|
||||
The current toolkit has strong primitives: parse, query, extract, transform,
|
||||
compose, include, template render, contract stub generation, generation plans,
|
||||
and a provider-neutral assisted-generation hook.
|
||||
|
||||
What is missing is orchestration. Users can script the pieces manually, but
|
||||
there is not yet a first-class workflow model for:
|
||||
|
||||
```text
|
||||
Markdown sources -> extracted data products -> processors -> generated outputs
|
||||
```
|
||||
|
||||
See `docs/markdown-dataflow-workflow-assessment.md`.
|
||||
|
||||
## P11.1 - Define workflow plan model
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T001
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "c335cbaa-dfb9-4df5-b1ae-87aaf6097bd8"
|
||||
```
|
||||
|
||||
Define a Markdown/YAML workflow plan format with sources, named data products,
|
||||
steps, outputs, variables, dry-run behavior, diagnostics, and provenance.
|
||||
|
||||
Output: workflow schema, examples, and validation diagnostics.
|
||||
|
||||
## P11.2 - Implement Markdown source collectors
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T002
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "16a89801-d96d-437f-a883-81d09586f47a"
|
||||
```
|
||||
|
||||
Collect source data from files, globs, directories, frontmatter paths,
|
||||
selectors, sections, blocks, metrics, and future reference/index backends.
|
||||
|
||||
Output: source collector API, selector integration, and tests.
|
||||
|
||||
## P11.3 - Implement deterministic step registry
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T003
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "808bed93-c7e2-4b34-90f4-f6f961fef503"
|
||||
```
|
||||
|
||||
Create step types for query/extract, transform, compose, include, template
|
||||
render, contract stub generation, contract checks, and data shaping.
|
||||
|
||||
Output: deterministic workflow runner with dependency ordering.
|
||||
|
||||
## P11.4 - Implement data expression and binding model
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T004
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "ea1ad9d2-3668-4b65-afb4-f490e5bfd0c6"
|
||||
```
|
||||
|
||||
Allow workflow steps and outputs to reference previous results by stable names,
|
||||
for example `${sources.adrs.decisions}` or `${steps.summary.markdown}`.
|
||||
|
||||
Output: expression resolver, type checks, and missing-reference diagnostics.
|
||||
|
||||
## P11.5 - Add optional assisted processing step boundary
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T005
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "ed1adc60-fdd8-4d4c-b4d7-7ce906e641c6"
|
||||
```
|
||||
|
||||
Add assisted step support through the provider-neutral generation hook protocol.
|
||||
The workflow engine must not require provider dependencies and must support
|
||||
dry-run, optional steps, and policy gates before sending data to a provider.
|
||||
|
||||
Output: hook adapter interface and tests with fake providers.
|
||||
|
||||
## P11.6 - Implement multi-output sinks
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T006
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "902707d7-46fe-45d6-a9ec-b85763065ff9"
|
||||
```
|
||||
|
||||
Support writing one or many Markdown outputs from templates, generated content,
|
||||
or composed results. Outputs must be path-safe, reproducible, and traceable to
|
||||
their source data.
|
||||
|
||||
Output: output sink API, path-safety checks, and provenance manifests.
|
||||
|
||||
## P11.7 - Add workflow CLI
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T007
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "ccc26867-5724-4205-b3fe-a8b9d046775d"
|
||||
```
|
||||
|
||||
Add:
|
||||
|
||||
```text
|
||||
mkt workflow inspect <workflow.md>
|
||||
mkt workflow plan <workflow.md>
|
||||
mkt workflow run <workflow.md>
|
||||
```
|
||||
|
||||
Include JSON/YAML outputs for agent use.
|
||||
|
||||
## P11.8 - Add representative end-to-end examples
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0011-T008
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "f8501ea6-1ead-477d-8f64-c196e7edfe68"
|
||||
```
|
||||
|
||||
Create examples covering:
|
||||
|
||||
- multiple ADRs -> release notes
|
||||
- contract data -> generated documents
|
||||
- source snippets -> docs
|
||||
- deterministic summary -> optional assisted review -> final Markdown
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- A non-programmer can write a Markdown/YAML workflow that extracts data from
|
||||
Markdown documents and generates new Markdown outputs.
|
||||
- The same workflow is repeatable for identical inputs.
|
||||
- Assisted steps are optional and external.
|
||||
- Diagnostics identify which source, step, or output failed.
|
||||
- The implementation remains compatible with future references/processors,
|
||||
cache/provenance, context engines, and access-control policy.
|
||||
Reference in New Issue
Block a user