4.2 KiB
Markdown Dataflow Workflow Assessment
Date: 2026-05-04
Question
Can markitect-tool support workflows that grab data from one or more Markdown
documents, process it deterministically or with optional LLM assistance, and
inject the result into one or more Markdown outputs?
Short Answer
Partially today, but not yet as a clean framework.
The current implementation provides the right primitives:
- parse Markdown and frontmatter
- query/extract structured content
- transform documents
- compose files
- include/transclude content
- render deterministic templates
- generate stubs from contracts
- run simple generation plans
- expose a provider-neutral assisted-generation hook
However, the user still has to orchestrate these steps manually through shell commands or Python code. There is no declarative pipeline model that says:
sources -> extracted data products -> deterministic processors -> assisted
processors -> templates/generation -> multiple outputs
That missing layer is where markitect can become much more practical.
Comparison with markitect-main
markitect-main had more separate experiments:
- template parser and renderer
- data-driven draft generation
- prompt/LLM quality gates
- transclusion with variables and conditionals
- batch/document-processing commands
- infospace and spaces workflows
- cache/reference graph ideas
The new implementation is better in its core shape:
- smaller, provider-neutral modules
- deterministic behavior before optional LLM use
- CLI/API parity
- contracts as a stronger rule source than raw schemas
- structured query/extract feeding generation
- explicit safety boundaries for includes
- tests around each primitive
What we sacrificed:
- no first-class batch/pipeline runner yet
- no prompt/LLM workflow execution in core
- no variable/conditional transclusion yet
- no data-driven multi-record draft generator yet
- no workflow provenance graph tying inputs to outputs
- no multi-output orchestration
- no built-in object/data shaping between extraction and rendering
This is a good trade for the foundation, but the pipeline layer needs to exist.
Desired Workflow Shape
A future pipeline plan should be Markdown-native and inspectable:
# Release Note Pipeline
```yaml workflow
sources:
decisions:
glob: docs/adr/*.md
extract:
accepted:
selector: sections[heading=Decision]
status:
selector: frontmatter.status
steps:
summarize:
kind: deterministic.template
template: templates/release-summary.md
data:
decisions: ${sources.decisions.accepted}
assisted_review:
kind: assisted.generation
input: ${steps.summarize.markdown}
prompt: prompts/reviewer.md
optional: true
outputs:
release_notes:
template: templates/release-notes.md
data:
summary: ${steps.summarize.markdown}
review: ${steps.assisted_review.markdown}
output: out/release-notes.md
This should remain executable without LLM support. Assisted steps should be
optional, externally supplied, and policy-aware.
## Architecture Gap
The missing generalized layer needs:
- source collectors for Markdown files, globs, directories, and future indexes
- named extracted data products
- a small data expression model for referencing previous results
- deterministic step registry
- optional assisted step registry
- multi-output sinks
- provenance and diagnostics per step
- dry-run/plan/inspect modes
- caching and invalidation hooks
- policy hooks before assisted steps or sensitive output writes
## Relationship to Existing Workplans
- `MKTT-WP-0003` gives the primitive surface.
- `MKTT-WP-0010` gives richer references, processors, regions, and chunks.
- `MKTT-WP-0006` gives backend/provenance/cache interfaces.
- `MKTT-WP-0005` gives runtime context and form/assessment engines.
- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that
wires these together.
## Recommendation
Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching
and incremental processing for the current primitives.
Create a new workplan for declarative Markdown dataflow pipelines. It should be
P1/P2: important enough not to forget, but best implemented after the reference
and processor model has at least its first architecture pass.