Markdown Dataflow Workflow Assessment

Date: 2026-05-04

Question

Can markitect-tool support workflows that grab data from one or more Markdown documents, process it deterministically or with optional LLM assistance, and inject the result into one or more Markdown outputs?

Short Answer

Partially today, but not yet as a clean framework.

The current implementation provides the right primitives:

parse Markdown and frontmatter
query/extract structured content
transform documents
compose files
include/transclude content
render deterministic templates
generate stubs from contracts
run simple generation plans
expose a provider-neutral assisted-generation hook

However, the user still has to orchestrate these steps manually through shell commands or Python code. There is no declarative pipeline model that says:

sources -> extracted data products -> deterministic processors -> assisted
processors -> templates/generation -> multiple outputs

That missing layer is where markitect can become much more practical.

Comparison with markitect-main

markitect-main had more separate experiments:

template parser and renderer
data-driven draft generation
prompt/LLM quality gates
transclusion with variables and conditionals
batch/document-processing commands
infospace and spaces workflows
cache/reference graph ideas

The new implementation is better in its core shape:

smaller, provider-neutral modules
deterministic behavior before optional LLM use
CLI/API parity
contracts as a stronger rule source than raw schemas
structured query/extract feeding generation
explicit safety boundaries for includes
tests around each primitive

What we sacrificed:

no first-class batch/pipeline runner yet
no prompt/LLM workflow execution in core
no variable/conditional transclusion yet
no data-driven multi-record draft generator yet
no workflow provenance graph tying inputs to outputs
no multi-output orchestration
no built-in object/data shaping between extraction and rendering

This is a good trade for the foundation, but the pipeline layer needs to exist.

Desired Workflow Shape

A future pipeline plan should be Markdown-native and inspectable:

# Release Note Pipeline

```yaml workflow
sources:
  decisions:
    glob: docs/adr/*.md
    extract:
      accepted:
        selector: sections[heading=Decision]
      status:
        selector: frontmatter.status

steps:
  summarize:
    kind: deterministic.template
    template: templates/release-summary.md
    data:
      decisions: ${sources.decisions.accepted}

  assisted_review:
    kind: assisted.generation
    input: ${steps.summarize.markdown}
    prompt: prompts/reviewer.md
    optional: true

outputs:
  release_notes:
    template: templates/release-notes.md
    data:
      summary: ${steps.summarize.markdown}
      review: ${steps.assisted_review.markdown}
    output: out/release-notes.md


This should remain executable without LLM support. Assisted steps should be
optional, externally supplied, and policy-aware.

## Architecture Gap

The missing generalized layer needs:

- source collectors for Markdown files, globs, directories, and future indexes
- named extracted data products
- a small data expression model for referencing previous results
- deterministic step registry
- optional assisted step registry
- multi-output sinks
- provenance and diagnostics per step
- dry-run/plan/inspect modes
- caching and invalidation hooks
- policy hooks before assisted steps or sensitive output writes

## Relationship to Existing Workplans

- `MKTT-WP-0003` gives the primitive surface.
- `MKTT-WP-0010` gives richer references, processors, regions, and chunks.
- `MKTT-WP-0006` gives backend/provenance/cache interfaces.
- `MKTT-WP-0005` gives runtime context and form/assessment engines.
- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that
  wires these together.

## Recommendation

Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching
and incremental processing for the current primitives.

Create a new workplan for declarative Markdown dataflow pipelines. It should be
P1/P2: important enough not to forget, but best implemented after the reference
and processor model has at least its first architecture pass.

4.2 KiB Raw Blame History

Markdown Dataflow Workflow Assessment

Question

Short Answer

Comparison with markitect-main

Desired Workflow Shape

4.2 KiB

Raw Blame History