--- id: MKTT-WP-0011 type: workplan title: "Markdown Dataflow Pipeline Workflows" domain: markitect status: done owner: markitect-tool topic_slug: markitect planning_priority: P2 planning_order: 75 depends_on_workplans: - MKTT-WP-0003 depends_on_tasks: - MKTT-WP-0010-T001 - MKTT-WP-0010-T005 related_workplans: - MKTT-WP-0005 - MKTT-WP-0006 - MKTT-WP-0008 - MKTT-WP-0009 created: "2026-05-04" updated: "2026-05-04" state_hub_workstream_id: "ed4c491d-4f81-4df0-af51-5f4bd4d1ad91" --- # MKTT-WP-0011: Markdown Dataflow Pipeline Workflows ## Purpose Create a declarative workflow layer for Markdown-to-Markdown dataflow: collecting data from one or more Markdown sources, applying deterministic and optional assisted processing, and injecting the results into one or more Markdown outputs. ## Background The current toolkit has strong primitives: parse, query, extract, transform, compose, include, template render, contract stub generation, generation plans, and a provider-neutral assisted-generation hook. What is missing is orchestration. Users can script the pieces manually, but there is not yet a first-class workflow model for: ```text Markdown sources -> extracted data products -> processors -> generated outputs ``` See `docs/markdown-dataflow-workflow-assessment.md`. ## P11.1 - Define workflow plan model ```task id: MKTT-WP-0011-T001 status: done priority: high state_hub_task_id: "c335cbaa-dfb9-4df5-b1ae-87aaf6097bd8" ``` Define a Markdown/YAML workflow plan format with sources, named data products, steps, outputs, variables, dry-run behavior, diagnostics, and provenance. Output: workflow schema, examples, and validation diagnostics. Implemented: `docs/workflow-definition-standard.md` defines the workflow standard with metadata, intent, inputs, outputs, steps, dependencies, conditions, artifacts, permissions, resource requirements, timeouts, retry policies, escalation rules, observability events, and human/agent/system responsibility boundaries. `WorkflowPlan` preserves all standard sections and unknown extension fields. ## P11.2 - Implement Markdown source collectors ```task id: MKTT-WP-0011-T002 status: done priority: high state_hub_task_id: "16a89801-d96d-437f-a883-81d09586f47a" ``` Collect source data from files, globs, directories, frontmatter paths, selectors, sections, blocks, metrics, and future reference/index backends. Output: source collector API, selector integration, and tests. Implemented: workflow inputs collect `file`, `path`, `files`, `glob`, and `directory` Markdown sources, support frontmatter filters, metrics, frontmatter, selectors, and named extractions. ## P11.3 - Implement deterministic step registry ```task id: MKTT-WP-0011-T003 status: done priority: high state_hub_task_id: "808bed93-c7e2-4b34-90f4-f6f961fef503" ``` Create step types for query/extract, transform, compose, include, template render, contract stub generation, contract checks, and data shaping. Output: deterministic workflow runner with dependency ordering. Implemented: deterministic workflow runner supports dependency ordering and step kinds `shape`, `extract`, `query`, `template`, `compose`, `transform`, `include`, `contract_stub`, `contract_check`, and `assisted` boundary. ## P11.4 - Implement data expression and binding model ```task id: MKTT-WP-0011-T004 status: done priority: high state_hub_task_id: "ea1ad9d2-3668-4b65-afb4-f490e5bfd0c6" ``` Allow workflow steps and outputs to reference previous results by stable names, for example `${sources.adrs.decisions}` or `${steps.summary.markdown}`. Output: expression resolver, type checks, and missing-reference diagnostics. Implemented: `${...}` bindings preserve native types for full-expression values, interpolate text inside longer strings, support dictionary paths, numeric list indexes, and list projection over dictionaries. ## P11.5 - Add optional assisted processing step boundary ```task id: MKTT-WP-0011-T005 status: done priority: medium state_hub_task_id: "ed1adc60-fdd8-4d4c-b4d7-7ce906e641c6" ``` Add assisted step support through the provider-neutral generation hook protocol. The workflow engine must not require provider dependencies and must support dry-run, optional steps, and policy gates before sending data to a provider. Output: hook adapter interface and tests with fake providers. Implemented: assisted steps use the provider-neutral generation hook boundary. Without a hook, optional assisted steps are skipped with warning diagnostics and required assisted steps fail. Tests include an injected fake hook. ## P11.6 - Implement multi-output sinks ```task id: MKTT-WP-0011-T006 status: done priority: high state_hub_task_id: "902707d7-46fe-45d6-a9ec-b85763065ff9" ``` Support writing one or many Markdown outputs from templates, generated content, or composed results. Outputs must be path-safe, reproducible, and traceable to their source data. Output: output sink API, path-safety checks, and provenance manifests. Implemented: outputs can render content or templates, write multiple Markdown files under a safe output root, support dry-run planning, and emit output records plus provenance/trace events. ## P11.7 - Add workflow CLI ```task id: MKTT-WP-0011-T007 status: done priority: high state_hub_task_id: "ccc26867-5724-4205-b3fe-a8b9d046775d" ``` Add: ```text mkt workflow inspect mkt workflow plan mkt workflow run ``` Include JSON/YAML outputs for agent use. Implemented: - `mkt workflow inspect ` - `mkt workflow plan ` - `mkt workflow run ` All commands support text, JSON, and YAML output. ## P11.8 - Add representative end-to-end examples ```task id: MKTT-WP-0011-T008 status: done priority: high state_hub_task_id: "f8501ea6-1ead-477d-8f64-c196e7edfe68" ``` Create examples covering: - multiple ADRs -> release notes - contract data -> generated documents - source snippets -> docs - deterministic summary -> optional assisted review -> final Markdown Implemented: examples under `examples/workflows/` cover ADR release notes, source snippet extraction, and an optional assisted review boundary. ## Exit Criteria - A non-programmer can write a Markdown/YAML workflow that extracts data from Markdown documents and generates new Markdown outputs. - The same workflow is repeatable for identical inputs. - Assisted steps are optional and external. - Diagnostics identify which source, step, or output failed. - The implementation remains compatible with future references/processors, cache/provenance, context engines, and access-control policy.