Added deterministic function layer

This commit is contained in:
2026-05-04 19:26:25 +02:00
parent 3840ff4617
commit 1197b39a76
11 changed files with 1305 additions and 16 deletions

137
docs/document-functions.md Normal file
View File

@@ -0,0 +1,137 @@
# Document Function Layer
Date: 2026-05-04
## Purpose
Document functions are a Markdown-native authoring surface over existing
Markitect primitives. They let a document author write small deterministic
operations inline or in fenced blocks while preserving diagnostics,
provenance, trace events, capability metadata, and extension descriptors.
The first implementation is intentionally conservative:
- deterministic functions only
- no network access
- no filesystem access
- no external processes
- no provider or assisted-generation calls
- no live flex-auth or external authorization service required
Riskier functions can be added later as optional adapters once capability and
policy gates are explicit.
## Syntax
Inline calls use:
```markdown
{{mkt:text.upper "draft"}}
```
Fenced block calls use:
````markdown
```mkt-function md.codeblock lang=python
print("hello")
```
````
Names are namespace-qualified. Arguments may be positional or named:
```markdown
{{mkt:md.heading text="Decision" level=2}}
```
Pipeline calls pass the previous result as the first argument of the next
function:
```markdown
{{mkt:text.upper "draft" | text.replace DRAFT Final}}
```
Values of the form `${name}` are resolved from `ProcessingContext.variables`.
This keeps data binding aligned with workflow expression conventions without
creating a second workflow engine.
## Built-In Functions
Initial deterministic functions:
| Function | Purpose |
| --- | --- |
| `text.upper` | Uppercase text. |
| `text.lower` | Lowercase text. |
| `text.title` | Title-case text. |
| `text.trim` | Trim surrounding whitespace. |
| `text.replace` | Replace text. |
| `text.join` | Join values with an optional separator. |
| `md.heading` | Create a Markdown heading. |
| `md.bold` | Create bold Markdown text. |
| `md.link` | Create a Markdown link. |
| `md.codeblock` | Create a fenced code block. |
| `data.get` | Read a value from processing context variables. |
## CLI
List functions:
```text
mkt function list
```
Validate calls without rendering:
```text
mkt function check examples/functions/basic-functions.md
```
Render deterministic calls:
```text
mkt function render examples/functions/basic-functions.md
```
JSON and YAML outputs include calls, diagnostics, provenance, and trace data.
## Registry And Extension Fit
The function layer has its own `DocumentFunctionRegistry`. Functions are
described by `DocumentFunctionDescriptor`:
- stable id and namespace
- parameters
- output type
- execution kind
- capability declarations
- safety metadata
- examples
The built-in extension catalog exposes this layer as `document.function` with
kind `document-function`. This keeps it discoverable without replacing
processors, workflows, references, contracts, templates, or query engines.
## Policy And Capability Gates
The first evaluator blocks non-deterministic functions and supports local
capability blocking through `ProcessingContext.policy`, for example:
```python
ProcessingContext(policy={"blocked_capabilities": ["document_function"]})
```
Future functions that read files, access network resources, invoke external
processes, render exports, or call assisted generation must declare those
capabilities before execution. External policy services may provide decisions
through adapters later, but deterministic function execution has no external
service dependency.
## Design Rules
- Stay close to Markdown and preserve CommonMark documents unless function
syntax is explicit.
- Keep deterministic execution useful without backends or providers.
- Surface diagnostics instead of silently deleting failed calls.
- Preserve source line information where available.
- Treat functions as an authoring surface over existing capabilities, not as a
second workflow engine.

View File

@@ -39,7 +39,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0011` | complete | done | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Markdown dataflow workflow layer is complete: workflow standard, source collectors, binding model, deterministic steps, assisted boundary, safe outputs, CLI, docs, and examples. |
| `MKTT-WP-0009` | complete | done | `MKTT-WP-0006` | Access-controlled knowledge gateway is complete: local labels, trust zones, path rules, policy-aware cache query/search, decisions, diagnostics, and external adapter boundaries. |
| `MKTT-WP-0014` | complete | done | `MKTT-WP-0009` | Markitect-side enterprise IAM access-control integration is complete: NetKingdom/key-cape-compatible identity claims, flex-auth resource/policy contract, directory group resolution fixtures, decision-log sink, workflow declarations, CLI commands, and external PDP request examples. |
| `MKTT-WP-0012` | P3 | todo | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Future Quarkdown-inspired document function layer: reusable Markdown-native function calls over processors, references, contracts, workflows, and later assisted steps. |
| `MKTT-WP-0012` | complete | done | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Document function layer is complete: deterministic Markdown-native function descriptors, registry, inline/fenced syntax, pipelines, context bindings, CLI, docs, examples, diagnostics, provenance, and extension descriptor. |
| `MKTT-WP-0008` | P3 | todo | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory cache after backend and policy floor are available. |
## Dependency Notes
@@ -69,11 +69,10 @@ runtime/workflow expansion because it reduces central wiring and gives future
features a canonical processing context/result/diagnostic/provenance model. It
is not a business dataflow layer; that remains `MKTT-WP-0011`.
`MKTT-WP-0012` captures the Quarkdown-inspired document function layer. It
should follow `MKTT-WP-0011` because the workflow layer will reveal which
operations deserve author-facing function syntax. It should remain optional and
capability-gated, especially before assisted, external, file, or network
functions are allowed.
`MKTT-WP-0012` completed the Quarkdown-inspired document function layer as a
deterministic authoring surface over existing Markitect capabilities. Assisted,
external, file, network, render/export, and provider-backed functions remain
future optional extensions behind local capability and policy gates.
`MKTT-WP-0014` completed Markitect-side enterprise IAM integration for the
access-control gateway. Central authorization administration remains optional

View File

@@ -0,0 +1,15 @@
# Basic Document Functions
Status: {{mkt:text.upper "draft"}}
{{mkt:md.bold "Important"}} sections can use inline functions.
Pipeline result: {{mkt:text.upper "draft" | text.replace DRAFT Final}}
```mkt-function md.heading level=2
Generated Section
```
```mkt-function md.codeblock lang=python
print("hello from a deterministic function")
```

View File

@@ -20,6 +20,19 @@ from markitect_tool.contract import (
validate_contract,
validate_contract_file,
)
from markitect_tool.document_function import (
DocumentFunctionCall,
DocumentFunctionDescriptor,
DocumentFunctionError,
DocumentFunctionEvaluationResult,
DocumentFunctionParameter,
DocumentFunctionRegistry,
DocumentFunctionRun,
default_document_function_registry,
parse_document_function_calls,
render_document_functions,
validate_document_functions,
)
from markitect_tool.cache import (
CacheEntry,
CacheManifest,
@@ -220,6 +233,17 @@ __all__ = [
"load_contract_file",
"validate_contract",
"validate_contract_file",
"DocumentFunctionCall",
"DocumentFunctionDescriptor",
"DocumentFunctionError",
"DocumentFunctionEvaluationResult",
"DocumentFunctionParameter",
"DocumentFunctionRegistry",
"DocumentFunctionRun",
"default_document_function_registry",
"parse_document_function_calls",
"render_document_functions",
"validate_document_functions",
"CacheEntry",
"CacheManifest",
"CacheStatus",

View File

@@ -37,6 +37,13 @@ from markitect_tool.contract import (
load_contract_file,
validate_contract,
)
from markitect_tool.document_function import (
DocumentFunctionError,
default_document_function_registry,
render_document_functions,
validate_document_functions,
)
from markitect_tool.extension import ProcessingContext
from markitect_tool.explode import (
ExplodeError,
explode_markdown_file,
@@ -858,6 +865,77 @@ def policy_resource_manifest(manifest_file: Path, output_format: str) -> None:
_emit_resource_manifest_result({"manifest": manifest.to_dict()}, output_format)
@main.group("function")
def function_group() -> None:
"""Inspect and execute deterministic document functions."""
@function_group.command("list")
@click.option("--namespace", help="Only list functions in one namespace.")
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def function_list(namespace: str | None, output_format: str) -> None:
"""List registered document functions."""
registry = default_document_function_registry()
functions = [descriptor.to_dict() for descriptor in registry.list(namespace=namespace)]
_emit_function_catalog({"count": len(functions), "functions": functions}, output_format)
@function_group.command("render")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def function_render(file: Path, output_format: str) -> None:
"""Render deterministic document function calls in a Markdown file."""
try:
text = file.read_text(encoding="utf-8")
result = render_document_functions(text, context=ProcessingContext(source_path=file))
except DocumentFunctionError as exc:
raise click.ClickException(str(exc)) from exc
_emit_function_result(result.to_dict(), output_format)
raise click.exceptions.Exit(0 if result.valid else 1)
@function_group.command("check")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option("--allow", "allowed", multiple=True, help="Only allow this function id. May be repeated.")
@click.option("--forbid", "forbidden", multiple=True, help="Forbid this function id. May be repeated.")
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def function_check(
file: Path,
allowed: tuple[str, ...],
forbidden: tuple[str, ...],
output_format: str,
) -> None:
"""Validate document function calls without rendering."""
try:
text = file.read_text(encoding="utf-8")
result = validate_document_functions(text, allowed=list(allowed), forbidden=list(forbidden))
except DocumentFunctionError as exc:
raise click.ClickException(str(exc)) from exc
_emit_function_check_result(result.to_dict(), output_format)
raise click.exceptions.Exit(0 if result.valid else 1)
@main.group("class")
def class_group() -> None:
"""Resolve deterministic content classes."""
@@ -1831,6 +1909,39 @@ def _emit_resource_manifest_result(data: dict, output_format: str) -> None:
click.echo(f"actions: {actions}")
def _emit_function_catalog(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
for function in data.get("functions", []):
click.echo(f"{function['id']}: {function.get('summary', '')}")
def _emit_function_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo(data.get("content", ""))
for diagnostic in data.get("diagnostics", []):
click.echo(f"[{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
def _emit_function_check_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo("valid" if data.get("valid") else "invalid")
click.echo(f"functions: {len(data.get('calls', []))}")
for diagnostic in data.get("diagnostics", []):
click.echo(f"- [{diagnostic['severity']}] {diagnostic['code']}: {diagnostic['message']}")
def _emit_metrics(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))

View File

@@ -0,0 +1,791 @@
"""Markdown-native deterministic document functions."""
from __future__ import annotations
import json
import re
import shlex
from dataclasses import asdict, dataclass, field
from typing import Any, Callable
from markitect_tool.diagnostics import Diagnostic, SourceLocation, has_error
from markitect_tool.extension import (
ProcessingCapability,
ProcessingContext,
ProcessingProvenance,
ProcessingResult,
ProcessingTrace,
)
INLINE_CALL_RE = re.compile(r"\{\{mkt:(?P<body>.+?)\}\}", re.DOTALL)
FENCE_CALL_RE = re.compile(
r"```(?P<info>[^\n`]*)\n(?P<body>.*?)\n```",
re.DOTALL,
)
FunctionImplementation = Callable[..., Any]
class DocumentFunctionError(ValueError):
"""Raised when document function parsing or evaluation fails."""
@dataclass(frozen=True)
class DocumentFunctionParameter:
"""One declared document function parameter."""
name: str
kind: str = "string"
required: bool = True
default: Any = None
variadic: bool = False
description: str | None = None
def to_dict(self) -> dict[str, Any]:
return _drop_empty(asdict(self))
@dataclass(frozen=True)
class DocumentFunctionDescriptor:
"""Inspectable descriptor for a document function."""
id: str
summary: str
parameters: list[DocumentFunctionParameter] = field(default_factory=list)
output_type: str = "markdown"
execution: str = "deterministic"
capabilities: list[ProcessingCapability] = field(default_factory=list)
safety: dict[str, Any] = field(default_factory=dict)
examples: list[str] = field(default_factory=list)
metadata: dict[str, Any] = field(default_factory=dict)
implementation: FunctionImplementation | None = field(default=None, compare=False, repr=False)
@property
def namespace(self) -> str:
return self.id.split(".", 1)[0] if "." in self.id else "default"
def to_dict(self) -> dict[str, Any]:
return _drop_empty(
{
"id": self.id,
"namespace": self.namespace,
"summary": self.summary,
"parameters": [parameter.to_dict() for parameter in self.parameters],
"output_type": self.output_type,
"execution": self.execution,
"capabilities": [capability.to_dict() for capability in self.capabilities],
"safety": self.safety,
"examples": self.examples,
"metadata": self.metadata,
}
)
@dataclass(frozen=True)
class DocumentFunctionCall:
"""Parsed document function call."""
function_id: str
args: list[Any] = field(default_factory=list)
kwargs: dict[str, Any] = field(default_factory=dict)
body: str | None = None
raw: str = ""
inline: bool = True
line: int | None = None
pipeline: list["DocumentFunctionCall"] = field(default_factory=list)
def to_dict(self) -> dict[str, Any]:
data = asdict(self)
data["pipeline"] = [call.to_dict() for call in self.pipeline]
return _drop_empty(data)
@dataclass(frozen=True)
class DocumentFunctionRun:
"""One function call result."""
call: DocumentFunctionCall
output: Any = None
diagnostics: list[Diagnostic] = field(default_factory=list)
provenance: list[ProcessingProvenance] = field(default_factory=list)
trace: list[ProcessingTrace] = field(default_factory=list)
@property
def valid(self) -> bool:
return not has_error(self.diagnostics)
def to_dict(self) -> dict[str, Any]:
return _drop_empty(
{
"call": self.call.to_dict(),
"valid": self.valid,
"output": self.output,
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
"provenance": [event.to_dict() for event in self.provenance],
"trace": [event.to_dict() for event in self.trace],
}
)
@dataclass(frozen=True)
class DocumentFunctionEvaluationResult:
"""Result of expanding document functions in a Markdown document."""
content: str
calls: list[DocumentFunctionRun] = field(default_factory=list)
diagnostics: list[Diagnostic] = field(default_factory=list)
provenance: list[ProcessingProvenance] = field(default_factory=list)
trace: list[ProcessingTrace] = field(default_factory=list)
@property
def valid(self) -> bool:
return not has_error(self.diagnostics)
def to_processing_result(self) -> ProcessingResult:
return ProcessingResult(
output={"content": self.content},
diagnostics=self.diagnostics,
provenance=self.provenance,
trace=self.trace,
metadata={"calls": [run.call.to_dict() for run in self.calls]},
)
def to_dict(self) -> dict[str, Any]:
return _drop_empty(
{
"valid": self.valid,
"content": self.content,
"calls": [run.to_dict() for run in self.calls],
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
"provenance": [event.to_dict() for event in self.provenance],
"trace": [event.to_dict() for event in self.trace],
}
)
class DocumentFunctionRegistry:
"""Registry and evaluator for document functions."""
def __init__(
self,
descriptors: list[DocumentFunctionDescriptor] | None = None,
) -> None:
self._descriptors: dict[str, DocumentFunctionDescriptor] = {}
for descriptor in descriptors or []:
self.register(descriptor)
def register(self, descriptor: DocumentFunctionDescriptor) -> None:
if descriptor.id in self._descriptors:
raise DocumentFunctionError(f"Duplicate document function `{descriptor.id}`")
if descriptor.implementation is None:
raise DocumentFunctionError(f"Document function `{descriptor.id}` has no implementation")
self._descriptors[descriptor.id] = descriptor
def get(self, function_id: str) -> DocumentFunctionDescriptor:
try:
return self._descriptors[function_id]
except KeyError as exc:
raise DocumentFunctionError(f"Unknown document function `{function_id}`") from exc
def list(self, *, namespace: str | None = None) -> list[DocumentFunctionDescriptor]:
descriptors = [self._descriptors[key] for key in sorted(self._descriptors)]
if namespace is not None:
return [descriptor for descriptor in descriptors if descriptor.namespace == namespace]
return descriptors
def to_dict(self) -> dict[str, Any]:
return {
"count": len(self._descriptors),
"functions": [descriptor.to_dict() for descriptor in self.list()],
}
def evaluate_call(
self,
call: DocumentFunctionCall,
*,
context: ProcessingContext | None = None,
) -> DocumentFunctionRun:
context = context or ProcessingContext()
output: Any = None
diagnostics: list[Diagnostic] = []
provenance: list[ProcessingProvenance] = []
trace: list[ProcessingTrace] = []
calls = [call, *call.pipeline]
for index, current in enumerate(calls):
if index > 0:
current = DocumentFunctionCall(
function_id=current.function_id,
args=[output, *current.args],
kwargs=current.kwargs,
body=current.body,
raw=current.raw,
inline=current.inline,
line=current.line,
)
run = self._evaluate_single(current, context=context)
diagnostics.extend(run.diagnostics)
provenance.extend(run.provenance)
trace.extend(run.trace)
if not run.valid:
output = current.raw
break
output = run.output
return DocumentFunctionRun(
call=call,
output=output,
diagnostics=diagnostics,
provenance=provenance,
trace=trace,
)
def _evaluate_single(
self,
call: DocumentFunctionCall,
*,
context: ProcessingContext,
) -> DocumentFunctionRun:
try:
descriptor = self.get(call.function_id)
except DocumentFunctionError as exc:
return _call_error(call, "function.unknown", str(exc), context)
if descriptor.execution != "deterministic":
return _call_error(
call,
"function.execution_blocked",
f"Function `{descriptor.id}` is `{descriptor.execution}` and is not enabled.",
context,
details={"execution": descriptor.execution},
)
blocked = _blocked_capabilities(descriptor, context)
if blocked:
return _call_error(
call,
"function.capability_blocked",
f"Function `{descriptor.id}` requires blocked capabilities {blocked}.",
context,
details={"capabilities": blocked},
)
try:
args = [_resolve_value(arg, context) for arg in call.args]
kwargs = {key: _resolve_value(value, context) for key, value in call.kwargs.items()}
if call.body is not None:
kwargs.setdefault("body", _resolve_value(call.body, context))
_validate_arguments(descriptor, args, kwargs)
if descriptor.id == "data.get":
output = context.variables.get(str(args[0]), kwargs.get("default", ""))
raise _FunctionOutputReady(output)
assert descriptor.implementation is not None
output = descriptor.implementation(*args, **kwargs)
except _FunctionOutputReady as ready:
output = ready.output
except Exception as exc:
return _call_error(call, "function.evaluation_failed", str(exc), context)
provenance = [
ProcessingProvenance(
operation=f"document_function.{descriptor.id}",
source_path=str(context.source_path) if context.source_path else None,
metadata={
"function": descriptor.id,
"execution": descriptor.execution,
"output_type": descriptor.output_type,
},
)
]
trace = [
ProcessingTrace(
event="document_function.executed",
metadata={"function": descriptor.id, "line": call.line},
)
]
return DocumentFunctionRun(call=call, output=output, provenance=provenance, trace=trace)
def default_document_function_registry() -> DocumentFunctionRegistry:
"""Return built-in deterministic document functions."""
return DocumentFunctionRegistry(
[
_descriptor(
"text.upper",
"Uppercase text.",
_text_upper,
[DocumentFunctionParameter("value")],
examples=['{{mkt:text.upper "draft"}}'],
),
_descriptor(
"text.lower",
"Lowercase text.",
_text_lower,
[DocumentFunctionParameter("value")],
examples=['{{mkt:text.lower "DRAFT"}}'],
),
_descriptor(
"text.title",
"Title-case text.",
_text_title,
[DocumentFunctionParameter("value")],
examples=['{{mkt:text.title "release notes"}}'],
),
_descriptor(
"text.trim",
"Trim surrounding whitespace.",
_text_trim,
[DocumentFunctionParameter("value")],
examples=['{{mkt:text.trim " ok "}}'],
),
_descriptor(
"text.replace",
"Replace text deterministically.",
_text_replace,
[
DocumentFunctionParameter("value"),
DocumentFunctionParameter("old"),
DocumentFunctionParameter("new"),
],
examples=['{{mkt:text.replace "draft" draft final}}'],
),
_descriptor(
"text.join",
"Join text values.",
_text_join,
[
DocumentFunctionParameter("items", variadic=True),
DocumentFunctionParameter("sep", required=False, default=""),
],
examples=['{{mkt:text.join "A" "B" sep=", "}}'],
),
_descriptor(
"md.heading",
"Create a Markdown heading.",
_md_heading,
[
DocumentFunctionParameter("text", required=False),
DocumentFunctionParameter("level", kind="integer", required=False, default=2),
DocumentFunctionParameter("body", required=False),
],
examples=['{{mkt:md.heading text="Decision" level=2}}'],
),
_descriptor(
"md.bold",
"Create Markdown bold text.",
_md_bold,
[DocumentFunctionParameter("text")],
examples=['{{mkt:md.bold "Important"}}'],
),
_descriptor(
"md.link",
"Create a Markdown link.",
_md_link,
[DocumentFunctionParameter("text"), DocumentFunctionParameter("url")],
examples=['{{mkt:md.link "OpenAI" "https://openai.com"}}'],
),
_descriptor(
"md.codeblock",
"Create a fenced Markdown code block.",
_md_codeblock,
[
DocumentFunctionParameter("body", required=False),
DocumentFunctionParameter("lang", required=False, default=""),
],
examples=["```mkt-function md.codeblock lang=python\nprint('hi')\n```"],
),
_descriptor(
"data.get",
"Read a value from processing context variables.",
_data_get,
[DocumentFunctionParameter("key"), DocumentFunctionParameter("default", required=False, default="")],
examples=["{{mkt:data.get title}}"],
),
]
)
def parse_document_function_calls(text: str) -> list[DocumentFunctionCall]:
"""Parse inline and fenced document function calls."""
calls: list[DocumentFunctionCall] = []
for match in INLINE_CALL_RE.finditer(text):
line = _line_for_offset(text, match.start())
calls.append(_parse_call_expression(match.group("body"), raw=match.group(0), inline=True, line=line))
for match in FENCE_CALL_RE.finditer(text):
info = match.group("info").strip()
tokens = info.split(None, 1)
if not tokens or tokens[0] not in {"mkt-function", "markitect-function", "function"}:
continue
expression = tokens[1] if len(tokens) > 1 else ""
line = _line_for_offset(text, match.start())
calls.append(
_parse_call_expression(
expression,
raw=match.group(0),
inline=False,
line=line,
body=match.group("body"),
)
)
return calls
def render_document_functions(
text: str,
*,
registry: DocumentFunctionRegistry | None = None,
context: ProcessingContext | None = None,
) -> DocumentFunctionEvaluationResult:
"""Expand deterministic document functions in Markdown content."""
registry = registry or default_document_function_registry()
context = context or ProcessingContext()
runs: list[DocumentFunctionRun] = []
diagnostics: list[Diagnostic] = []
provenance: list[ProcessingProvenance] = []
trace: list[ProcessingTrace] = []
def replace_inline(match: re.Match[str]) -> str:
call = _parse_call_expression(
match.group("body"),
raw=match.group(0),
inline=True,
line=_line_for_offset(text, match.start()),
)
run = registry.evaluate_call(call, context=context)
runs.append(run)
diagnostics.extend(run.diagnostics)
provenance.extend(run.provenance)
trace.extend(run.trace)
if not run.valid:
return match.group(0)
return _format_function_output(run.output, inline=True)
content = INLINE_CALL_RE.sub(replace_inline, text)
def replace_fence(match: re.Match[str]) -> str:
info = match.group("info").strip()
tokens = info.split(None, 1)
if not tokens or tokens[0] not in {"mkt-function", "markitect-function", "function"}:
return match.group(0)
call = _parse_call_expression(
tokens[1] if len(tokens) > 1 else "",
raw=match.group(0),
inline=False,
line=_line_for_offset(text, match.start()),
body=match.group("body"),
)
run = registry.evaluate_call(call, context=context)
runs.append(run)
diagnostics.extend(run.diagnostics)
provenance.extend(run.provenance)
trace.extend(run.trace)
if not run.valid:
return match.group(0)
return _format_function_output(run.output, inline=False)
content = FENCE_CALL_RE.sub(replace_fence, content)
trace.append(ProcessingTrace(event="document_function.rendered", metadata={"calls": len(runs)}))
return DocumentFunctionEvaluationResult(
content=content,
calls=runs,
diagnostics=diagnostics,
provenance=provenance,
trace=trace,
)
def validate_document_functions(
text: str,
*,
registry: DocumentFunctionRegistry | None = None,
allowed: list[str] | None = None,
forbidden: list[str] | None = None,
) -> DocumentFunctionEvaluationResult:
"""Validate function calls without rendering the document."""
registry = registry or default_document_function_registry()
allowed_set = set(allowed or [])
forbidden_set = set(forbidden or [])
diagnostics: list[Diagnostic] = []
runs: list[DocumentFunctionRun] = []
for call in parse_document_function_calls(text):
if allowed_set and call.function_id not in allowed_set:
diagnostics.append(_diagnostic(call, "function.not_allowed", f"Function `{call.function_id}` is not allowed."))
if call.function_id in forbidden_set:
diagnostics.append(_diagnostic(call, "function.forbidden", f"Function `{call.function_id}` is forbidden."))
try:
descriptor = registry.get(call.function_id)
if descriptor.execution != "deterministic":
diagnostics.append(
_diagnostic(
call,
"function.unstable",
f"Function `{call.function_id}` is `{descriptor.execution}` and cannot run in deterministic contexts.",
)
)
except DocumentFunctionError as exc:
diagnostics.append(_diagnostic(call, "function.unknown", str(exc)))
runs.append(DocumentFunctionRun(call=call))
return DocumentFunctionEvaluationResult(content=text, calls=runs, diagnostics=diagnostics)
def _parse_call_expression(
expression: str,
*,
raw: str,
inline: bool,
line: int | None,
body: str | None = None,
) -> DocumentFunctionCall:
pipeline_parts = [part.strip() for part in expression.split("|") if part.strip()]
if not pipeline_parts:
raise DocumentFunctionError("Document function call is empty.")
first = _parse_single_call(pipeline_parts[0], raw=raw, inline=inline, line=line, body=body)
pipeline = [
_parse_single_call(part, raw=part, inline=inline, line=line)
for part in pipeline_parts[1:]
]
return DocumentFunctionCall(
function_id=first.function_id,
args=first.args,
kwargs=first.kwargs,
body=first.body,
raw=raw,
inline=inline,
line=line,
pipeline=pipeline,
)
def _parse_single_call(
expression: str,
*,
raw: str,
inline: bool,
line: int | None,
body: str | None = None,
) -> DocumentFunctionCall:
try:
parts = shlex.split(expression)
except ValueError as exc:
raise DocumentFunctionError(f"Invalid function syntax: {exc}") from exc
if not parts:
raise DocumentFunctionError("Document function call is empty.")
function_id = parts[0]
args: list[Any] = []
kwargs: dict[str, Any] = {}
for token in parts[1:]:
if "=" in token and not token.startswith("="):
key, value = token.split("=", 1)
kwargs[key.replace("-", "_")] = _parse_literal(value)
else:
args.append(_parse_literal(token))
return DocumentFunctionCall(
function_id=function_id,
args=args,
kwargs=kwargs,
body=body,
raw=raw,
inline=inline,
line=line,
)
def _descriptor(
function_id: str,
summary: str,
implementation: FunctionImplementation,
parameters: list[DocumentFunctionParameter],
*,
output_type: str = "markdown",
examples: list[str] | None = None,
) -> DocumentFunctionDescriptor:
return DocumentFunctionDescriptor(
id=function_id,
summary=summary,
parameters=parameters,
output_type=output_type,
capabilities=[
ProcessingCapability(id="document_function", kind="execute"),
ProcessingCapability(id="deterministic", kind="execution"),
],
safety={"network": False, "filesystem": False, "assisted_generation": False},
examples=examples or [],
implementation=implementation,
)
def _validate_arguments(
descriptor: DocumentFunctionDescriptor,
args: list[Any],
kwargs: dict[str, Any],
) -> None:
required = [parameter for parameter in descriptor.parameters if parameter.required and not parameter.variadic]
positional = [parameter for parameter in descriptor.parameters if not parameter.variadic]
variadic = next((parameter for parameter in descriptor.parameters if parameter.variadic), None)
if len(args) > len(positional) and variadic is None:
raise DocumentFunctionError(f"Function `{descriptor.id}` received too many positional arguments.")
for index, parameter in enumerate(required):
if index < len(args) or parameter.name in kwargs:
continue
raise DocumentFunctionError(f"Function `{descriptor.id}` requires `{parameter.name}`.")
def _blocked_capabilities(
descriptor: DocumentFunctionDescriptor,
context: ProcessingContext,
) -> list[str]:
blocked = []
policy = context.policy or {}
blocked_ids = set(policy.get("blocked_capabilities") or [])
for capability in descriptor.capabilities:
if capability.id in blocked_ids:
blocked.append(capability.id)
if descriptor.safety.get("network") and policy.get("network") is False:
blocked.append("network")
if descriptor.safety.get("filesystem") and policy.get("filesystem") is False:
blocked.append("filesystem")
if descriptor.safety.get("assisted_generation") and policy.get("assisted_generation") is False:
blocked.append("assisted_generation")
return sorted(set(blocked))
def _resolve_value(value: Any, context: ProcessingContext) -> Any:
if isinstance(value, str):
if value.startswith("${") and value.endswith("}"):
key = value[2:-1].strip()
return context.variables.get(key, "")
return value
def _format_function_output(value: Any, *, inline: bool) -> str:
if isinstance(value, str):
return value
if isinstance(value, list):
return ", ".join(str(item) for item in value) if inline else "\n".join(str(item) for item in value)
if isinstance(value, dict):
return json.dumps(value, sort_keys=True, ensure_ascii=False)
return "" if value is None else str(value)
def _parse_literal(value: str) -> Any:
lowered = value.lower()
if lowered == "true":
return True
if lowered == "false":
return False
if lowered in {"null", "none"}:
return None
try:
return int(value)
except ValueError:
pass
return value
def _call_error(
call: DocumentFunctionCall,
code: str,
message: str,
context: ProcessingContext,
details: dict[str, Any] | None = None,
) -> DocumentFunctionRun:
return DocumentFunctionRun(
call=call,
diagnostics=[
Diagnostic(
severity="error",
code=code,
message=message,
source=SourceLocation(
path=str(context.source_path) if context.source_path else None,
line=call.line,
)
if context.source_path or call.line
else None,
details=details or {"function": call.function_id},
)
],
)
def _diagnostic(
call: DocumentFunctionCall,
code: str,
message: str,
) -> Diagnostic:
return Diagnostic(
severity="error",
code=code,
message=message,
source=SourceLocation(line=call.line) if call.line else None,
details={"function": call.function_id},
)
def _line_for_offset(text: str, offset: int) -> int:
return text.count("\n", 0, offset) + 1
def _text_upper(value: Any) -> str:
return str(value).upper()
def _text_lower(value: Any) -> str:
return str(value).lower()
def _text_title(value: Any) -> str:
return str(value).title()
def _text_trim(value: Any) -> str:
return str(value).strip()
def _text_replace(value: Any, old: Any, new: Any) -> str:
return str(value).replace(str(old), str(new))
def _text_join(*items: Any, sep: str = "") -> str:
return str(sep).join(str(item) for item in items)
def _md_heading(text: Any = None, *, level: int = 2, body: Any = None) -> str:
heading = str(text if text is not None else body if body is not None else "").strip()
depth = max(1, min(6, int(level)))
return f"{'#' * depth} {heading}"
def _md_bold(text: Any) -> str:
return f"**{text}**"
def _md_link(text: Any, url: Any) -> str:
return f"[{text}]({url})"
def _md_codeblock(body: Any = "", *, lang: str = "") -> str:
info = str(lang).strip()
return f"```{info}\n{body}\n```"
def _data_get(key: Any, default: Any = "", *, body: Any = None) -> Any:
return body if body is not None else default if str(key).startswith("$") else key
class _FunctionOutputReady(Exception):
def __init__(self, output: Any) -> None:
self.output = output
def _drop_empty(data: dict[str, Any]) -> dict[str, Any]:
return {
key: value
for key, value in data.items()
if value not in (None, [], {}, "")
}

View File

@@ -18,6 +18,7 @@ def builtin_extension_registry() -> ExtensionRegistry:
_runtime_form_state_descriptor(),
_runtime_assessment_descriptor(),
_local_label_policy_descriptor(),
_document_function_descriptor(),
]:
registry.register(descriptor)
return registry
@@ -233,3 +234,34 @@ def _local_label_policy_descriptor() -> ExtensionDescriptor:
]
},
)
def _document_function_descriptor() -> ExtensionDescriptor:
return ExtensionDescriptor(
id="document.function",
kind="document-function",
summary="Markdown-native deterministic document function registry and evaluator.",
capabilities=[
ProcessingCapability(id="document_function", kind="execute"),
ProcessingCapability(id="deterministic", kind="execution"),
ProcessingCapability(id="diagnostics", kind="emit"),
ProcessingCapability(id="provenance", kind="emit"),
],
safety={
"network": False,
"filesystem": False,
"assisted_generation": False,
"external_process": False,
},
input_contract="Markdown with {{mkt:function ...}} or mkt-function fences",
output_contract="DocumentFunctionEvaluationResult",
diagnostics_namespace="document_function",
provenance_prefix="document_function",
cli={"commands": ["mkt function list", "mkt function check", "mkt function render"]},
docs=["docs/document-functions.md"],
examples=["examples/functions/basic-functions.md"],
metadata={
"execution": "deterministic-only",
"external_policy_services_required": False,
},
)

View File

@@ -18,6 +18,7 @@ def test_builtin_extension_registry_lists_query_processors_and_backend():
assert "runtime.form-state" in ids
assert "runtime.assessment" in ids
assert "policy.local-label" in ids
assert "document.function" in ids
def test_builtin_processor_descriptors_capture_safety_and_provenance():
@@ -103,3 +104,22 @@ def test_builtin_policy_descriptor_exposes_cli_and_adapter_boundary():
assert "mkt policy resource-manifest" in descriptor.cli["commands"]
assert "IdentityClaimsAdapter" in descriptor.metadata["external_adapters"]
assert "RelationshipPolicyAdapter" in descriptor.metadata["external_adapters"]
def test_builtin_document_function_descriptor_exposes_deterministic_boundary():
registry = builtin_extension_registry()
descriptor = registry.get("document.function")
assert descriptor.kind == "document-function"
assert descriptor.safety["network"] is False
assert descriptor.metadata["external_policy_services_required"] is False
assert {capability.id for capability in descriptor.capabilities} >= {
"document_function",
"deterministic",
}
assert descriptor.cli["commands"] == [
"mkt function list",
"mkt function check",
"mkt function render",
]

View File

@@ -18,6 +18,7 @@ def test_collect_cli_command_specs_from_builtin_registry():
assert ("processor.uppercase", "mkt process") in commands
assert ("backend.local-sqlite", "mkt cache index") in commands
assert ("backend.local-sqlite", "mkt search") in commands
assert ("document.function", "mkt function render") in commands
def test_cli_command_spec_serializes_without_empty_fields():

View File

@@ -0,0 +1,136 @@
import json
from pathlib import Path
from click.testing import CliRunner
from markitect_tool.cli import main
from markitect_tool.document_function import (
DocumentFunctionDescriptor,
DocumentFunctionParameter,
DocumentFunctionRegistry,
default_document_function_registry,
parse_document_function_calls,
render_document_functions,
validate_document_functions,
)
from markitect_tool.extension import ProcessingContext
def test_parse_inline_and_fenced_function_calls():
text = """# Demo
Inline {{mkt:text.upper "draft"}}.
```mkt-function md.heading level=3
Decision
```
"""
calls = parse_document_function_calls(text)
assert [call.function_id for call in calls] == ["text.upper", "md.heading"]
assert calls[0].args == ["draft"]
assert calls[1].kwargs == {"level": 3}
assert calls[1].body.strip() == "Decision"
def test_render_document_functions_expands_inline_and_fenced_calls():
text = """# Demo
Inline {{mkt:text.upper "draft"}}.
```mkt-function md.heading level=3
Decision
```
"""
result = render_document_functions(text)
assert result.valid
assert "Inline DRAFT." in result.content
assert "### Decision" in result.content
assert len(result.calls) == 2
assert result.provenance[0].operation == "document_function.text.upper"
def test_pipeline_passes_previous_output_to_next_function():
result = render_document_functions('{{mkt:text.upper "draft" | text.replace DRAFT Final}}')
assert result.valid
assert result.content == "Final"
def test_context_variables_can_be_used_in_function_arguments():
context = ProcessingContext(variables={"title": "Architecture Decision"})
result = render_document_functions("{{mkt:md.heading ${title} level=2}}", context=context)
assert result.content == "## Architecture Decision"
def test_validate_document_functions_reports_forbidden_calls():
result = validate_document_functions("{{mkt:text.upper draft}}", forbidden=["text.upper"])
assert not result.valid
assert result.diagnostics[0].code == "function.forbidden"
def test_registry_can_expose_custom_function_without_core_rewrite():
registry = DocumentFunctionRegistry()
registry.register(
DocumentFunctionDescriptor(
id="demo.wrap",
summary="Wrap text.",
parameters=[DocumentFunctionParameter("value")],
implementation=lambda value: f"[{value}]",
)
)
result = render_document_functions("{{mkt:demo.wrap ok}}", registry=registry)
assert result.valid
assert result.content == "[ok]"
def test_unknown_function_is_left_in_place_with_diagnostic():
result = render_document_functions("{{mkt:nope.missing value}}")
assert not result.valid
assert result.content == "{{mkt:nope.missing value}}"
assert result.diagnostics[0].code == "function.unknown"
def test_mkt_function_list_outputs_builtin_catalog():
result = CliRunner().invoke(main, ["function", "list", "--format", "json"])
data = json.loads(result.output)
assert result.exit_code == 0
ids = {function["id"] for function in data["functions"]}
assert {"text.upper", "md.heading", "md.codeblock"} <= ids
def test_mkt_function_render_outputs_expanded_markdown(tmp_path: Path):
file = tmp_path / "functions.md"
file.write_text("# Demo\n\n{{mkt:md.bold Important}}\n", encoding="utf-8")
result = CliRunner().invoke(main, ["function", "render", str(file)])
assert result.exit_code == 0
assert "**Important**" in result.output
def test_mkt_function_check_can_restrict_allowed_functions(tmp_path: Path):
file = tmp_path / "functions.md"
file.write_text("{{mkt:text.upper draft}}\n", encoding="utf-8")
result = CliRunner().invoke(main, ["function", "check", str(file), "--allow", "md.heading"])
assert result.exit_code == 1
assert "function.not_allowed" in result.output
def test_default_registry_serializes_without_implementations():
data = default_document_function_registry().to_dict()
assert data["count"] >= 1
assert "implementation" not in data["functions"][0]

View File

@@ -3,10 +3,10 @@ id: MKTT-WP-0012
type: workplan
title: "Document Function Layer"
domain: markitect
status: todo
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: P3
planning_priority: complete
planning_order: 85
depends_on_workplans:
- MKTT-WP-0004
@@ -34,6 +34,29 @@ This layer should let authors and agents express reusable document operations
as named functions over Markdown content, structured data, references,
processors, contracts, workflows, and eventually assisted generation.
## Implementation Summary
Implemented the first deterministic document function layer:
- `DocumentFunctionDescriptor`, `DocumentFunctionParameter`,
`DocumentFunctionCall`, `DocumentFunctionRegistry`, run/evaluation result
envelopes, diagnostics, provenance, and trace output.
- Conservative inline syntax: `{{mkt:function.name ...}}`.
- Conservative fenced syntax: `mkt-function function.name ...`.
- Pipeline chaining with `|`, where the previous result becomes the next
function's first argument.
- `ProcessingContext.variables` bindings through `${name}` values.
- Built-in deterministic functions for text operations, Markdown headings,
bold text, links, code blocks, and context value lookup.
- `mkt function list`, `mkt function check`, and `mkt function render`.
- Built-in extension descriptor `document.function`.
- Documentation and examples in `docs/document-functions.md` and
`examples/functions/basic-functions.md`.
Assisted, filesystem, network, external-process, render/export, and live policy
service functions remain future optional extensions gated by local capability
and policy metadata.
## Background
Quarkdown demonstrates that document authoring can benefit from a compact
@@ -102,7 +125,7 @@ a second workflow engine or a dependency on flex-auth.
```task
id: MKTT-WP-0012-T001
status: todo
status: done
priority: high
state_hub_task_id: "13ddfdbb-8fc1-4570-915d-038d40d489e1"
```
@@ -125,7 +148,7 @@ Output: design note, schema, and small examples.
```task
id: MKTT-WP-0012-T002
status: todo
status: done
priority: high
state_hub_task_id: "58166792-457a-4844-96a7-27baf50c1d7e"
```
@@ -150,7 +173,7 @@ Output: syntax proposal with accepted/rejected examples and parser impact.
```task
id: MKTT-WP-0012-T003
status: todo
status: done
priority: high
state_hub_task_id: "06196bde-cc10-464e-9d1a-6b8acc616c06"
```
@@ -172,7 +195,7 @@ Output: registry API, adapter protocol, and tests with fake functions.
```task
id: MKTT-WP-0012-T004
status: todo
status: done
priority: high
state_hub_task_id: "986121f0-f824-46eb-af59-65ebf2389f34"
```
@@ -193,7 +216,7 @@ Output: minimal evaluator and CLI/library tests.
```task
id: MKTT-WP-0012-T005
status: todo
status: done
priority: medium
state_hub_task_id: "94bb131c-cb4e-4391-8453-bb9de4f3834c"
```
@@ -210,7 +233,7 @@ Output: chaining rules, data binding rules, and diagnostic examples.
```task
id: MKTT-WP-0012-T006
status: todo
status: done
priority: medium
state_hub_task_id: "899d361f-8eaa-4098-97d9-0fd33afc3304"
```
@@ -233,7 +256,7 @@ Output: contract integration and actionable diagnostics.
```task
id: MKTT-WP-0012-T007
status: todo
status: done
priority: medium
state_hub_task_id: "2a51b42c-b46b-42cd-ba33-ab504100e653"
```
@@ -252,7 +275,7 @@ Output: permission model, blocked-operation diagnostics, and policy examples.
```task
id: MKTT-WP-0012-T008
status: todo
status: done
priority: medium
state_hub_task_id: "30358902-5564-48a2-b1e3-e400bfbe7d1a"
```