generated from coulomb/repo-seed
146 lines
4.2 KiB
Markdown
146 lines
4.2 KiB
Markdown
# Markdown Dataflow Workflow Assessment
|
|
|
|
Date: 2026-05-04
|
|
|
|
## Question
|
|
|
|
Can `markitect-tool` support workflows that grab data from one or more Markdown
|
|
documents, process it deterministically or with optional LLM assistance, and
|
|
inject the result into one or more Markdown outputs?
|
|
|
|
## Short Answer
|
|
|
|
Partially today, but not yet as a clean framework.
|
|
|
|
The current implementation provides the right primitives:
|
|
|
|
- parse Markdown and frontmatter
|
|
- query/extract structured content
|
|
- transform documents
|
|
- compose files
|
|
- include/transclude content
|
|
- render deterministic templates
|
|
- generate stubs from contracts
|
|
- run simple generation plans
|
|
- expose a provider-neutral assisted-generation hook
|
|
|
|
However, the user still has to orchestrate these steps manually through shell
|
|
commands or Python code. There is no declarative pipeline model that says:
|
|
|
|
```text
|
|
sources -> extracted data products -> deterministic processors -> assisted
|
|
processors -> templates/generation -> multiple outputs
|
|
```
|
|
|
|
That missing layer is where markitect can become much more practical.
|
|
|
|
## Comparison with markitect-main
|
|
|
|
`markitect-main` had more separate experiments:
|
|
|
|
- template parser and renderer
|
|
- data-driven draft generation
|
|
- prompt/LLM quality gates
|
|
- transclusion with variables and conditionals
|
|
- batch/document-processing commands
|
|
- infospace and spaces workflows
|
|
- cache/reference graph ideas
|
|
|
|
The new implementation is better in its core shape:
|
|
|
|
- smaller, provider-neutral modules
|
|
- deterministic behavior before optional LLM use
|
|
- CLI/API parity
|
|
- contracts as a stronger rule source than raw schemas
|
|
- structured query/extract feeding generation
|
|
- explicit safety boundaries for includes
|
|
- tests around each primitive
|
|
|
|
What we sacrificed:
|
|
|
|
- no first-class batch/pipeline runner yet
|
|
- no prompt/LLM workflow execution in core
|
|
- no variable/conditional transclusion yet
|
|
- no data-driven multi-record draft generator yet
|
|
- no workflow provenance graph tying inputs to outputs
|
|
- no multi-output orchestration
|
|
- no built-in object/data shaping between extraction and rendering
|
|
|
|
This is a good trade for the foundation, but the pipeline layer needs to exist.
|
|
|
|
## Desired Workflow Shape
|
|
|
|
A future pipeline plan should be Markdown-native and inspectable:
|
|
|
|
```markdown
|
|
# Release Note Pipeline
|
|
|
|
```yaml workflow
|
|
sources:
|
|
decisions:
|
|
glob: docs/adr/*.md
|
|
extract:
|
|
accepted:
|
|
selector: sections[heading=Decision]
|
|
status:
|
|
selector: frontmatter.status
|
|
|
|
steps:
|
|
summarize:
|
|
kind: deterministic.template
|
|
template: templates/release-summary.md
|
|
data:
|
|
decisions: ${sources.decisions.accepted}
|
|
|
|
assisted_review:
|
|
kind: assisted.generation
|
|
input: ${steps.summarize.markdown}
|
|
prompt: prompts/reviewer.md
|
|
optional: true
|
|
|
|
outputs:
|
|
release_notes:
|
|
template: templates/release-notes.md
|
|
data:
|
|
summary: ${steps.summarize.markdown}
|
|
review: ${steps.assisted_review.markdown}
|
|
output: out/release-notes.md
|
|
```
|
|
```
|
|
|
|
This should remain executable without LLM support. Assisted steps should be
|
|
optional, externally supplied, and policy-aware.
|
|
|
|
## Architecture Gap
|
|
|
|
The missing generalized layer needs:
|
|
|
|
- source collectors for Markdown files, globs, directories, and future indexes
|
|
- named extracted data products
|
|
- a small data expression model for referencing previous results
|
|
- deterministic step registry
|
|
- optional assisted step registry
|
|
- multi-output sinks
|
|
- provenance and diagnostics per step
|
|
- dry-run/plan/inspect modes
|
|
- caching and invalidation hooks
|
|
- policy hooks before assisted steps or sensitive output writes
|
|
|
|
## Relationship to Existing Workplans
|
|
|
|
- `MKTT-WP-0003` gives the primitive surface.
|
|
- `MKTT-WP-0010` gives richer references, processors, regions, and chunks.
|
|
- `MKTT-WP-0006` gives backend/provenance/cache interfaces.
|
|
- `MKTT-WP-0005` gives runtime context and form/assessment engines.
|
|
- `MKTT-WP-0011` should become the declarative pipeline/workflow layer that
|
|
wires these together.
|
|
|
|
## Recommendation
|
|
|
|
Do not squeeze this into P3.7. P3.7 should stay focused on lightweight caching
|
|
and incremental processing for the current primitives.
|
|
|
|
Create a new workplan for declarative Markdown dataflow pipelines. It should be
|
|
P1/P2: important enough not to forget, but best implemented after the reference
|
|
and processor model has at least its first architecture pass.
|