generated from coulomb/repo-seed
Parse Markdown files into a structured Python object
This commit is contained in:
14
README.md
14
README.md
@@ -20,3 +20,17 @@ requirements documents in `wiki/`.
|
|||||||
|
|
||||||
The repo is registered with the Custodian State Hub as `markitect-tool` under
|
The repo is registered with the Custodian State Hub as `markitect-tool` under
|
||||||
the `markitect` domain. See `docs/state-hub-integration.md`.
|
the `markitect` domain. See `docs/state-hub-integration.md`.
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
Run the tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m pytest
|
||||||
|
```
|
||||||
|
|
||||||
|
Try the parser CLI from a checkout:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PYTHONPATH=src python3 -m markitect_tool parse README.md --format tree
|
||||||
|
```
|
||||||
|
|||||||
36
docs/packaging-decision.md
Normal file
36
docs/packaging-decision.md
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
# Packaging Decision
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
`markitect-tool` starts as a Python 3.12+ package with:
|
||||||
|
|
||||||
|
- Distribution name: `markitect-tool`
|
||||||
|
- Import package: `markitect_tool`
|
||||||
|
- CLI entry point: `mkt`
|
||||||
|
- Build backend: `setuptools`
|
||||||
|
- Test runner: `pytest`
|
||||||
|
- Source layout: `src/markitect_tool`
|
||||||
|
|
||||||
|
## Initial Dependencies
|
||||||
|
|
||||||
|
Core dependencies:
|
||||||
|
|
||||||
|
- `markdown-it-py`
|
||||||
|
- `PyYAML`
|
||||||
|
- `click>=8.0`
|
||||||
|
|
||||||
|
Optional extras:
|
||||||
|
|
||||||
|
- `query`: `jsonpath-ng`
|
||||||
|
- `tables`: `tabulate`
|
||||||
|
- `llm`: `llm-connect`
|
||||||
|
- `dev`: `pytest`
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
This follows the WP-0002 dependency classification and keeps the first
|
||||||
|
implementation focused on deterministic markdown parsing and CLI access. The
|
||||||
|
package name avoids legacy `markitect.*` imports while the `mkt` entry point
|
||||||
|
matches the PRD.
|
||||||
@@ -32,8 +32,10 @@ workplans/
|
|||||||
|
|
||||||
## Follow-Up
|
## Follow-Up
|
||||||
|
|
||||||
Once implementation dependencies exist, add an SBOM source and update State Hub
|
SBOM source: `sbom-tools.yaml`.
|
||||||
with the SBOM ingestion result. This seed repo currently has no package manifest.
|
|
||||||
|
Initial SBOM ingest succeeded on 2026-05-03 with seven declared entries for the
|
||||||
|
core and optional dependencies.
|
||||||
|
|
||||||
## Registered Extension Points
|
## Registered Extension Points
|
||||||
|
|
||||||
|
|||||||
40
pyproject.toml
Normal file
40
pyproject.toml
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=69"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "markitect-tool"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Markdown-native toolkit and CLI for structured knowledge artifacts"
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.12"
|
||||||
|
license = { text = "MIT" }
|
||||||
|
dependencies = [
|
||||||
|
"click>=8.0",
|
||||||
|
"markdown-it-py",
|
||||||
|
"PyYAML",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.optional-dependencies]
|
||||||
|
dev = [
|
||||||
|
"pytest>=8",
|
||||||
|
]
|
||||||
|
query = [
|
||||||
|
"jsonpath-ng>=1.5",
|
||||||
|
]
|
||||||
|
tables = [
|
||||||
|
"tabulate>=0.9",
|
||||||
|
]
|
||||||
|
llm = [
|
||||||
|
"llm-connect @ file:///home/worsch/llm-connect",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
mkt = "markitect_tool.cli:main"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["src"]
|
||||||
|
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["tests"]
|
||||||
|
pythonpath = ["src"]
|
||||||
29
sbom-tools.yaml
Normal file
29
sbom-tools.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
tools:
|
||||||
|
- name: click
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: markdown-it-py
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: PyYAML
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: pytest
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: true
|
||||||
|
- name: jsonpath-ng
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: tabulate
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: llm-connect
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
21
src/markitect_tool/__init__.py
Normal file
21
src/markitect_tool/__init__.py
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
"""Structured markdown primitives for markitect-tool."""
|
||||||
|
|
||||||
|
from markitect_tool.core import (
|
||||||
|
ContentBlock,
|
||||||
|
Document,
|
||||||
|
Heading,
|
||||||
|
MarkdownParseError,
|
||||||
|
Section,
|
||||||
|
parse_markdown,
|
||||||
|
parse_markdown_file,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ContentBlock",
|
||||||
|
"Document",
|
||||||
|
"Heading",
|
||||||
|
"MarkdownParseError",
|
||||||
|
"Section",
|
||||||
|
"parse_markdown",
|
||||||
|
"parse_markdown_file",
|
||||||
|
]
|
||||||
6
src/markitect_tool/__main__.py
Normal file
6
src/markitect_tool/__main__.py
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
"""Run the `mkt` CLI with `python -m markitect_tool`."""
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
|
||||||
|
|
||||||
|
main()
|
||||||
5
src/markitect_tool/cli/__init__.py
Normal file
5
src/markitect_tool/cli/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
"""Command-line interface for markitect-tool."""
|
||||||
|
|
||||||
|
from markitect_tool.cli.main import main
|
||||||
|
|
||||||
|
__all__ = ["main"]
|
||||||
44
src/markitect_tool/cli/main.py
Normal file
44
src/markitect_tool/cli/main.py
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
"""`mkt` command entry point."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import click
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from markitect_tool.core import parse_markdown_file
|
||||||
|
|
||||||
|
|
||||||
|
@click.group()
|
||||||
|
@click.version_option()
|
||||||
|
def main() -> None:
|
||||||
|
"""Markdown-native toolkit for structured knowledge artifacts."""
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "tree"], case_sensitive=False),
|
||||||
|
default="json",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def parse(file: Path, output_format: str) -> None:
|
||||||
|
"""Parse a Markdown file into a structured representation."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
data = document.to_dict()
|
||||||
|
if output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
elif output_format == "tree":
|
||||||
|
for heading in document.headings:
|
||||||
|
click.echo(f"{'#' * heading.level} {heading.text}")
|
||||||
|
else:
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
14
src/markitect_tool/core/__init__.py
Normal file
14
src/markitect_tool/core/__init__.py
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
"""Core markdown parsing and document model."""
|
||||||
|
|
||||||
|
from markitect_tool.core.document import ContentBlock, Document, Heading, Section
|
||||||
|
from markitect_tool.core.parser import MarkdownParseError, parse_markdown, parse_markdown_file
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"ContentBlock",
|
||||||
|
"Document",
|
||||||
|
"Heading",
|
||||||
|
"MarkdownParseError",
|
||||||
|
"Section",
|
||||||
|
"parse_markdown",
|
||||||
|
"parse_markdown_file",
|
||||||
|
]
|
||||||
72
src/markitect_tool/core/document.py
Normal file
72
src/markitect_tool/core/document.py
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
"""Structured document model for parsed Markdown."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Heading:
|
||||||
|
"""A Markdown heading with source location."""
|
||||||
|
|
||||||
|
level: int
|
||||||
|
text: str
|
||||||
|
line: int
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ContentBlock:
|
||||||
|
"""A top-level Markdown content block."""
|
||||||
|
|
||||||
|
type: str
|
||||||
|
text: str
|
||||||
|
line_start: int | None = None
|
||||||
|
line_end: int | None = None
|
||||||
|
heading_level: int | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = asdict(self)
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Section:
|
||||||
|
"""A heading-led section."""
|
||||||
|
|
||||||
|
heading: Heading
|
||||||
|
blocks: list[ContentBlock] = field(default_factory=list)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"heading": self.heading.to_dict(),
|
||||||
|
"blocks": [block.to_dict() for block in self.blocks],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Document:
|
||||||
|
"""Structured representation of a Markdown document."""
|
||||||
|
|
||||||
|
source_path: str | None
|
||||||
|
frontmatter: dict[str, Any]
|
||||||
|
body: str
|
||||||
|
blocks: list[ContentBlock]
|
||||||
|
headings: list[Heading]
|
||||||
|
sections: list[Section]
|
||||||
|
tokens: list[dict[str, Any]]
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"source_path": self.source_path,
|
||||||
|
"frontmatter": self.frontmatter,
|
||||||
|
"body": self.body,
|
||||||
|
"blocks": [block.to_dict() for block in self.blocks],
|
||||||
|
"headings": [heading.to_dict() for heading in self.headings],
|
||||||
|
"sections": [section.to_dict() for section in self.sections],
|
||||||
|
"tokens": self.tokens,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
182
src/markitect_tool/core/parser.py
Normal file
182
src/markitect_tool/core/parser.py
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
"""Markdown parsing into a stable structured representation."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
from markdown_it import MarkdownIt
|
||||||
|
from markdown_it.token import Token
|
||||||
|
|
||||||
|
from markitect_tool.core.document import ContentBlock, Document, Heading, Section
|
||||||
|
|
||||||
|
|
||||||
|
class MarkdownParseError(ValueError):
|
||||||
|
"""Raised when Markdown metadata cannot be parsed safely."""
|
||||||
|
|
||||||
|
|
||||||
|
def parse_markdown_file(path: str | Path) -> Document:
|
||||||
|
"""Parse a Markdown file into a structured document."""
|
||||||
|
|
||||||
|
file_path = Path(path)
|
||||||
|
text = file_path.read_text(encoding="utf-8")
|
||||||
|
return parse_markdown(text, source_path=str(file_path))
|
||||||
|
|
||||||
|
|
||||||
|
def parse_markdown(markdown: str, source_path: str | None = None) -> Document:
|
||||||
|
"""Parse Markdown text into frontmatter, blocks, headings, sections, and tokens."""
|
||||||
|
|
||||||
|
frontmatter, body, body_line_offset = _split_frontmatter(markdown)
|
||||||
|
tokens = _parse_tokens(body)
|
||||||
|
blocks, headings = _blocks_and_headings(tokens, body_line_offset)
|
||||||
|
sections = _sections_from_blocks(blocks, headings)
|
||||||
|
return Document(
|
||||||
|
source_path=source_path,
|
||||||
|
frontmatter=frontmatter,
|
||||||
|
body=body,
|
||||||
|
blocks=blocks,
|
||||||
|
headings=headings,
|
||||||
|
sections=sections,
|
||||||
|
tokens=tokens,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _split_frontmatter(markdown: str) -> tuple[dict[str, Any], str, int]:
|
||||||
|
if not markdown.startswith("---\n"):
|
||||||
|
return {}, markdown, 0
|
||||||
|
|
||||||
|
end = markdown.find("\n---", 4)
|
||||||
|
if end == -1:
|
||||||
|
return {}, markdown, 0
|
||||||
|
|
||||||
|
closing_end = markdown.find("\n", end + 4)
|
||||||
|
if closing_end == -1:
|
||||||
|
closing_end = len(markdown)
|
||||||
|
else:
|
||||||
|
closing_end += 1
|
||||||
|
|
||||||
|
raw_frontmatter = markdown[4:end]
|
||||||
|
body = markdown[closing_end:]
|
||||||
|
try:
|
||||||
|
data = yaml.safe_load(raw_frontmatter) if raw_frontmatter.strip() else {}
|
||||||
|
except yaml.YAMLError as exc:
|
||||||
|
raise MarkdownParseError(f"Invalid YAML frontmatter: {exc}") from exc
|
||||||
|
if data is None:
|
||||||
|
data = {}
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
raise MarkdownParseError("Frontmatter must be a mapping")
|
||||||
|
body_line_offset = markdown[:closing_end].count("\n")
|
||||||
|
return data, body, body_line_offset
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_tokens(markdown: str) -> list[dict[str, Any]]:
|
||||||
|
parser = MarkdownIt("commonmark", {"tables": True}).enable("table")
|
||||||
|
return [_token_to_dict(token) for token in parser.parse(markdown)]
|
||||||
|
|
||||||
|
|
||||||
|
def _token_to_dict(token: Token) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"type": token.type,
|
||||||
|
"tag": token.tag,
|
||||||
|
"attrs": token.attrs,
|
||||||
|
"map": token.map,
|
||||||
|
"nesting": token.nesting,
|
||||||
|
"level": token.level,
|
||||||
|
"children": [_token_to_dict(child) for child in token.children]
|
||||||
|
if token.children
|
||||||
|
else None,
|
||||||
|
"content": token.content,
|
||||||
|
"markup": token.markup,
|
||||||
|
"info": token.info,
|
||||||
|
"meta": token.meta,
|
||||||
|
"block": token.block,
|
||||||
|
"hidden": token.hidden,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
def _blocks_and_headings(
|
||||||
|
tokens: list[dict[str, Any]], line_offset: int
|
||||||
|
) -> tuple[list[ContentBlock], list[Heading]]:
|
||||||
|
blocks: list[ContentBlock] = []
|
||||||
|
headings: list[Heading] = []
|
||||||
|
|
||||||
|
for index, token in enumerate(tokens):
|
||||||
|
token_type = token["type"]
|
||||||
|
if token_type == "heading_open":
|
||||||
|
inline = _next_inline(tokens, index)
|
||||||
|
line_start, line_end = _line_range(token, line_offset)
|
||||||
|
level = int(token.get("tag", "h1").lstrip("h") or "1")
|
||||||
|
text = inline.get("content", "") if inline else ""
|
||||||
|
heading = Heading(level=level, text=text, line=line_start or 1)
|
||||||
|
headings.append(heading)
|
||||||
|
blocks.append(
|
||||||
|
ContentBlock(
|
||||||
|
type="heading",
|
||||||
|
text=text,
|
||||||
|
line_start=line_start,
|
||||||
|
line_end=line_end,
|
||||||
|
heading_level=level,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
elif token_type in {"paragraph_open", "bullet_list_open", "ordered_list_open", "blockquote_open", "fence", "code_block", "table_open"}:
|
||||||
|
line_start, line_end = _line_range(token, line_offset)
|
||||||
|
text = token.get("content", "")
|
||||||
|
if not text and token_type.endswith("_open"):
|
||||||
|
inline = _next_inline(tokens, index)
|
||||||
|
text = inline.get("content", "") if inline else ""
|
||||||
|
blocks.append(
|
||||||
|
ContentBlock(
|
||||||
|
type=_block_type(token_type),
|
||||||
|
text=text,
|
||||||
|
line_start=line_start,
|
||||||
|
line_end=line_end,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
return blocks, headings
|
||||||
|
|
||||||
|
|
||||||
|
def _next_inline(tokens: list[dict[str, Any]], index: int) -> dict[str, Any] | None:
|
||||||
|
if index + 1 < len(tokens) and tokens[index + 1]["type"] == "inline":
|
||||||
|
return tokens[index + 1]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _line_range(token: dict[str, Any], line_offset: int) -> tuple[int | None, int | None]:
|
||||||
|
line_map = token.get("map")
|
||||||
|
if not line_map:
|
||||||
|
return None, None
|
||||||
|
return line_map[0] + line_offset + 1, line_map[1] + line_offset
|
||||||
|
|
||||||
|
|
||||||
|
def _block_type(token_type: str) -> str:
|
||||||
|
return {
|
||||||
|
"paragraph_open": "paragraph",
|
||||||
|
"bullet_list_open": "bullet_list",
|
||||||
|
"ordered_list_open": "ordered_list",
|
||||||
|
"blockquote_open": "blockquote",
|
||||||
|
"fence": "code",
|
||||||
|
"code_block": "code",
|
||||||
|
"table_open": "table",
|
||||||
|
}.get(token_type, token_type)
|
||||||
|
|
||||||
|
|
||||||
|
def _sections_from_blocks(
|
||||||
|
blocks: list[ContentBlock], headings: list[Heading]
|
||||||
|
) -> list[Section]:
|
||||||
|
sections: list[Section] = []
|
||||||
|
current: Section | None = None
|
||||||
|
heading_index = 0
|
||||||
|
|
||||||
|
for block in blocks:
|
||||||
|
if block.type == "heading":
|
||||||
|
heading = headings[heading_index]
|
||||||
|
heading_index += 1
|
||||||
|
current = Section(heading=heading, blocks=[])
|
||||||
|
sections.append(current)
|
||||||
|
elif current is not None:
|
||||||
|
current.blocks.append(block)
|
||||||
|
|
||||||
|
return sections
|
||||||
89
tests/test_parse_contract.py
Normal file
89
tests/test_parse_contract.py
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool import MarkdownParseError, parse_markdown, parse_markdown_file
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_markdown_preserves_headings_and_paragraphs():
|
||||||
|
document = parse_markdown("# Heading\n\nThis is a paragraph.")
|
||||||
|
|
||||||
|
assert document.frontmatter == {}
|
||||||
|
assert document.headings[0].level == 1
|
||||||
|
assert document.headings[0].text == "Heading"
|
||||||
|
assert [block.type for block in document.blocks] == ["heading", "paragraph"]
|
||||||
|
assert document.sections[0].heading.text == "Heading"
|
||||||
|
assert document.sections[0].blocks[0].text == "This is a paragraph."
|
||||||
|
assert document.tokens[0]["type"] == "heading_open"
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_markdown_extracts_yaml_frontmatter():
|
||||||
|
markdown = """---
|
||||||
|
title: YAML Frontmatter Test Document
|
||||||
|
tags:
|
||||||
|
- yaml
|
||||||
|
- frontmatter
|
||||||
|
published: true
|
||||||
|
nested:
|
||||||
|
priority: high
|
||||||
|
---
|
||||||
|
|
||||||
|
# YAML Frontmatter Test Document
|
||||||
|
|
||||||
|
Body text.
|
||||||
|
"""
|
||||||
|
|
||||||
|
document = parse_markdown(markdown)
|
||||||
|
|
||||||
|
assert document.frontmatter["title"] == "YAML Frontmatter Test Document"
|
||||||
|
assert document.frontmatter["tags"] == ["yaml", "frontmatter"]
|
||||||
|
assert document.frontmatter["published"] is True
|
||||||
|
assert document.frontmatter["nested"]["priority"] == "high"
|
||||||
|
assert document.headings[0].line == 11
|
||||||
|
assert document.body.lstrip().startswith("# YAML Frontmatter Test Document")
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_markdown_without_frontmatter_is_graceful():
|
||||||
|
document = parse_markdown("# Document Without Frontmatter\n\nText.")
|
||||||
|
|
||||||
|
assert document.frontmatter == {}
|
||||||
|
assert document.headings[0].text == "Document Without Frontmatter"
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_markdown_rejects_non_mapping_frontmatter():
|
||||||
|
with pytest.raises(MarkdownParseError, match="Frontmatter must be a mapping"):
|
||||||
|
parse_markdown("---\n- nope\n---\n\n# Bad")
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_markdown_file_records_source_path(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Test Document\n\nBody", encoding="utf-8")
|
||||||
|
|
||||||
|
document = parse_markdown_file(source)
|
||||||
|
|
||||||
|
assert document.source_path == str(source)
|
||||||
|
assert document.headings[0].text == "Test Document"
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_parse_outputs_json(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Test Document\n\nBody", encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(main, ["parse", str(source)])
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert '"headings"' in result.output
|
||||||
|
assert "Test Document" in result.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_parse_outputs_tree(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# One\n\n## Two\n", encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(main, ["parse", str(source), "--format", "tree"])
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "# One" in result.output
|
||||||
|
assert "## Two" in result.output
|
||||||
@@ -58,7 +58,7 @@ migration assessment, and implementation.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0001-T004
|
id: MKTT-WP-0001-T004
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "c15f8492-93d0-43aa-ba12-0d4aaff97c03"
|
state_hub_task_id: "c15f8492-93d0-43aa-ba12-0d4aaff97c03"
|
||||||
```
|
```
|
||||||
@@ -67,11 +67,13 @@ Choose package/module names, Python version target, dependency manager, and
|
|||||||
test runner. Keep the decision lightweight and aligned with the future `mkt`
|
test runner. Keep the decision lightweight and aligned with the future `mkt`
|
||||||
CLI entry point.
|
CLI entry point.
|
||||||
|
|
||||||
|
Output: `docs/packaging-decision.md`.
|
||||||
|
|
||||||
## P1.5 - Add SBOM source once manifests exist
|
## P1.5 - Add SBOM source once manifests exist
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0001-T005
|
id: MKTT-WP-0001-T005
|
||||||
status: blocked
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "e77a5e46-aaa2-4717-922f-a871fa2fd1cc"
|
state_hub_task_id: "e77a5e46-aaa2-4717-922f-a871fa2fd1cc"
|
||||||
```
|
```
|
||||||
@@ -79,4 +81,4 @@ state_hub_task_id: "e77a5e46-aaa2-4717-922f-a871fa2fd1cc"
|
|||||||
After packaging files are introduced, generate or identify the SBOM source and
|
After packaging files are introduced, generate or identify the SBOM source and
|
||||||
update State Hub registration metadata.
|
update State Hub registration metadata.
|
||||||
|
|
||||||
Blocked because the repository has no implementation package manifest yet.
|
Output: `sbom-tools.yaml`; initial State Hub ingest succeeded on 2026-05-03.
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ contract and the `mkt` CLI.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0003-T001
|
id: MKTT-WP-0003-T001
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "9d9501fe-6809-4b4f-bda6-ec5e5952ddc7"
|
state_hub_task_id: "9d9501fe-6809-4b4f-bda6-ec5e5952ddc7"
|
||||||
```
|
```
|
||||||
@@ -30,11 +30,13 @@ state_hub_task_id: "9d9501fe-6809-4b4f-bda6-ec5e5952ddc7"
|
|||||||
Create project metadata, package layout, test structure, and a minimal CLI
|
Create project metadata, package layout, test structure, and a minimal CLI
|
||||||
entry point that can be installed or run locally.
|
entry point that can be installed or run locally.
|
||||||
|
|
||||||
|
Output: `pyproject.toml`, `src/markitect_tool/`, `tests/`.
|
||||||
|
|
||||||
## P3.2 - Implement structured markdown parse contract
|
## P3.2 - Implement structured markdown parse contract
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0003-T002
|
id: MKTT-WP-0003-T002
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "7dead033-e249-46b0-9eb3-908ae231a987"
|
state_hub_task_id: "7dead033-e249-46b0-9eb3-908ae231a987"
|
||||||
```
|
```
|
||||||
@@ -43,6 +45,9 @@ Implement FR-001 and FR-002: parse markdown files, preserve headings,
|
|||||||
frontmatter, sections, and content blocks, and expose structured output via
|
frontmatter, sections, and content blocks, and expose structured output via
|
||||||
API and CLI.
|
API and CLI.
|
||||||
|
|
||||||
|
Initial implementation complete for Markdown files, YAML frontmatter, headings,
|
||||||
|
sections, content blocks, parser tokens, API access, and `mkt parse`.
|
||||||
|
|
||||||
## P3.3 - Implement schema load and validation
|
## P3.3 - Implement schema load and validation
|
||||||
|
|
||||||
```task
|
```task
|
||||||
@@ -108,10 +113,12 @@ Implement FR-070 and FR-071 after the parse/schema contracts are stable.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0003-T008
|
id: MKTT-WP-0003-T008
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "94067c7e-e68b-45be-a1d6-90547eb15422"
|
state_hub_task_id: "94067c7e-e68b-45be-a1d6-90547eb15422"
|
||||||
```
|
```
|
||||||
|
|
||||||
Resolve `TD-MKTT-001` by adding the implementation scaffold: package metadata,
|
Resolve `TD-MKTT-001` by adding the implementation scaffold: package metadata,
|
||||||
module layout, test runner, and `mkt` CLI entry point.
|
module layout, test runner, and `mkt` CLI entry point.
|
||||||
|
|
||||||
|
Resolved by the initial package scaffold.
|
||||||
|
|||||||
Reference in New Issue
Block a user