# Markdown Dataflow Workflow Assessment Date: 2026-05-04 ## Question Can `markitect-tool` support workflows that grab data from one or more Markdown documents, process it deterministically or with optional LLM assistance, and inject the result into one or more Markdown outputs? ## Short Answer Partially today, but not yet as a clean framework. The current implementation provides the right primitives: - parse Markdown and frontmatter - query/extract structured content - transform documents - compose files - include/transclude content - render deterministic templates - generate stubs from contracts - run simple generation plans - expose a provider-neutral assisted-generation hook However, the user still has to orchestrate these steps manually through shell commands or Python code. There is no declarative pipeline model that says: ```text sources -> extracted data products -> deterministic processors -> assisted processors -> templates/generation -> multiple outputs ``` That missing layer is where markitect can become much more practical. ## Comparison with markitect-main `markitect-main` had more separate experiments: - template parser and renderer - data-driven draft generation - prompt/LLM quality gates - transclusion with variables and conditionals - batch/document-processing commands - infospace and spaces workflows - cache/reference graph ideas The new implementation is better in its core shape: - smaller, provider-neutral modules - deterministic behavior before optional LLM use - CLI/API parity - contracts as a stronger rule source than raw schemas - structured query/extract feeding generation - explicit safety boundaries for includes - tests around each primitive What we sacrificed: - no first-class batch/pipeline runner yet - no prompt/LLM workflow execution in core - no variable/conditional transclusion yet - no data-driven multi-record draft generator yet - no workflow provenance graph tying inputs to outputs - no multi-output orchestration - no built-in object/data shaping between extraction and rendering This is a good trade for the foundation, but the pipeline layer needs to exist. ## Desired Workflow Shape A future pipeline plan should be Markdown-native and inspectable: ```markdown # Release Note Pipeline ```yaml workflow sources: decisions: glob: docs/adr/*.md extract: accepted: selector: sections[heading=Decision] status: selector: frontmatter.status steps: summarize: kind: deterministic.template template: templates/release-summary.md data: decisions: ${sources.decisions.accepted} assisted_review: kind: assisted.generation input: ${steps.summarize.markdown} prompt: prompts/reviewer.md optional: true outputs: release_notes: template: templates/release-notes.md data: summary: ${steps.summarize.markdown} review: ${steps.assisted_review.markdown} output: out/release-notes.md ``` ``` This should remain executable without LLM support. Assisted steps should be optional, externally supplied, and policy-aware. ## Architecture Gap The missing generalized layer needs: - source collectors for Markdown files, globs, directories, and future indexes - named extracted data products - a small data expression model for referencing previous results - deterministic step registry - optional assisted step registry - multi-output sinks - provenance and diagnostics per step - dry-run/plan/inspect modes - caching and invalidation hooks - policy hooks before assisted steps or sensitive output writes ## Relationship to Existing Workplans - `MKTT-WP-0003` gives the primitive surface. - `MKTT-WP-0010` gives richer references, processors, regions, and chunks. - `MKTT-WP-0006` gives backend/provenance/cache interfaces. - `MKTT-WP-0005` gives runtime context and form/assessment engines. - `MKTT-WP-0011` should become the declarative pipeline/workflow layer that wires these together. ## Recommendation Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching and incremental processing for the current primitives. Create a new workplan for declarative Markdown dataflow pipelines. It should be P1/P2: important enough not to forget, but best implemented after the reference and processor model has at least its first architecture pass.