Workplan for dataflow pipeline workflows

This commit is contained in:
2026-05-04 01:22:45 +02:00
parent 1a1b5ab39c
commit 8260a66528
6 changed files with 337 additions and 1 deletions

View File

@@ -0,0 +1,145 @@
# Markdown Dataflow Workflow Assessment
Date: 2026-05-04
## Question
Can `markitect-tool` support workflows that grab data from one or more Markdown
documents, process it deterministically or with optional LLM assistance, and
inject the result into one or more Markdown outputs?
## Short Answer
Partially today, but not yet as a clean framework.
The current implementation provides the right primitives:
- parse Markdown and frontmatter
- query/extract structured content
- transform documents
- compose files
- include/transclude content
- render deterministic templates
- generate stubs from contracts
- run simple generation plans
- expose a provider-neutral assisted-generation hook
However, the user still has to orchestrate these steps manually through shell
commands or Python code. There is no declarative pipeline model that says:
```text
sources -> extracted data products -> deterministic processors -> assisted
processors -> templates/generation -> multiple outputs
```
That missing layer is where markitect can become much more practical.
## Comparison with markitect-main
`markitect-main` had more separate experiments:
- template parser and renderer
- data-driven draft generation
- prompt/LLM quality gates
- transclusion with variables and conditionals
- batch/document-processing commands
- infospace and spaces workflows
- cache/reference graph ideas
The new implementation is better in its core shape:
- smaller, provider-neutral modules
- deterministic behavior before optional LLM use
- CLI/API parity
- contracts as a stronger rule source than raw schemas
- structured query/extract feeding generation
- explicit safety boundaries for includes
- tests around each primitive
What we sacrificed:
- no first-class batch/pipeline runner yet
- no prompt/LLM workflow execution in core
- no variable/conditional transclusion yet
- no data-driven multi-record draft generator yet
- no workflow provenance graph tying inputs to outputs
- no multi-output orchestration
- no built-in object/data shaping between extraction and rendering
This is a good trade for the foundation, but the pipeline layer needs to exist.
## Desired Workflow Shape
A future pipeline plan should be Markdown-native and inspectable:
```markdown
# Release Note Pipeline
```yaml workflow
sources:
decisions:
glob: docs/adr/*.md
extract:
accepted:
selector: sections[heading=Decision]
status:
selector: frontmatter.status
steps:
summarize:
kind: deterministic.template
template: templates/release-summary.md
data:
decisions: ${sources.decisions.accepted}
assisted_review:
kind: assisted.generation
input: ${steps.summarize.markdown}
prompt: prompts/reviewer.md
optional: true
outputs:
release_notes:
template: templates/release-notes.md
data:
summary: ${steps.summarize.markdown}
review: ${steps.assisted_review.markdown}
output: out/release-notes.md
```
```
This should remain executable without LLM support. Assisted steps should be
optional, externally supplied, and policy-aware.
## Architecture Gap
The missing generalized layer needs:
- source collectors for Markdown files, globs, directories, and future indexes
- named extracted data products
- a small data expression model for referencing previous results
- deterministic step registry
- optional assisted step registry
- multi-output sinks
- provenance and diagnostics per step
- dry-run/plan/inspect modes
- caching and invalidation hooks
- policy hooks before assisted steps or sensitive output writes
## Relationship to Existing Workplans
- `MKTT-WP-0003` gives the primitive surface.
- `MKTT-WP-0010` gives richer references, processors, regions, and chunks.
- `MKTT-WP-0006` gives backend/provenance/cache interfaces.
- `MKTT-WP-0005` gives runtime context and form/assessment engines.
- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that
wires these together.
## Recommendation
Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching
and incremental processing for the current primitives.
Create a new workplan for declarative Markdown dataflow pipelines. It should be
P1/P2: important enough not to forget, but best implemented after the reference
and processor model has at least its first architecture pass.

View File

@@ -35,6 +35,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0010` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Trigger is satisfied; keep as the richer content-reference, processor, explode/implode, and weave/tangle track. |
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
| `MKTT-WP-0008` | P3 | todo | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory cache after backend and policy floor are available. |
@@ -53,6 +54,12 @@ context-memory, and access-control architecture before those become rigid.
These are mixed task/workstream dependencies. State Hub does not currently model
them natively.
`MKTT-WP-0011` captures the practical workflow layer that wires existing
primitives together: Markdown sources, selectors, deterministic processors,
optional assisted generation hooks, and multiple Markdown outputs. It should not
block P3.7, but it should follow the first reference model and processor
registry decisions in `MKTT-WP-0010`.
## State Hub Mirror
Native State Hub dependency edges should mirror the whole-workstream
@@ -69,6 +76,7 @@ dependencies:
- `MKTT-WP-0007 -> MKTT-WP-0006`
- `MKTT-WP-0005 -> MKTT-WP-0003`
- `MKTT-WP-0005 -> MKTT-WP-0004`
- `MKTT-WP-0011 -> MKTT-WP-0003`
- `MKTT-WP-0009 -> MKTT-WP-0006`
- `MKTT-WP-0008 -> MKTT-WP-0006`
- `MKTT-WP-0008 -> MKTT-WP-0007`