SQLite-backed local snapshot store

This commit is contained in:
2026-05-04 08:56:41 +02:00
parent 0d1ad21a9f
commit 36ff4cedab
7 changed files with 926 additions and 5 deletions

View File

@@ -0,0 +1,87 @@
# Local Index Backend
`markitect-tool` now includes a local SQLite snapshot/index backend as the
first practical implementation of the optional backend fabric.
## Purpose
The local index is optimized for repeatable Markdown infrastructure work:
- persist parsed document snapshots
- keep cheap source metadata for incremental refresh planning
- store document JSON for later AST/JSONPath use
- index frontmatter, headings, sections, blocks, and metrics
- preserve extension points for dependency edges, references, named regions,
chunks, processor outputs, FTS, and policy-aware access
The backend is optional. Single-file commands such as `mkt parse`, `mkt query`,
and `mkt ast` do not require it.
## Commands
Initialize the SQLite store:
```text
mkt cache init --root .
```
Build or refresh the local index:
```text
mkt cache index docs workplans --root .
```
Inspect a parsed AST without using the cache:
```text
mkt ast show docs/backend-fabric.md --format tree
mkt ast stats docs/backend-fabric.md
```
By default, the index is written to:
```text
.markitect/cache/index.sqlite3
```
Use `--index-path` to override it.
## Refresh Behavior
`mkt cache index` uses the same cheap-first refresh planning model as
`mkt backend refresh-plan`:
1. Compare path, size, mtime, parser identity, parse options, and contract hash.
2. Hash only files whose metadata changed.
3. Skip parse/index when metadata changed but content hash stayed the same.
4. Parse and index new or changed files.
5. Delete rows for removed source files.
The command reports planned work and actual work separately in JSON/YAML output.
## Stored Data
The first schema stores:
- `sources`: path, absolute path, size, mtime, content hash, snapshot id,
parser identity, parse option hash, contract hash, document JSON,
frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
- `headings`: heading level, text, and source line
- `sections`: heading metadata, section text, and source span
- `blocks`: block type, text, source span, and heading level
- `dependencies`: reserved dependency edge table for references,
transclusion, literate chunks, and future invalidation graphs
This is enough to recover the useful markitect-main idea of keeping parsed
structure available for faster and richer query backends, while keeping the
normal CLI usable without a cache.
## Future Work
`MKTT-WP-0007` still needs:
- JSONPath query adapter over stored or live document JSON
- FTS5 search over section/block rows
- cache-backed query commands
- richer dependency extraction from references, transclusion, and literate
chunks

View File

@@ -33,7 +33,7 @@ and descriptions mirror the operational view.
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. Preliminary refresh planning is in place as the performance contract. |
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. SQLite snapshots, AST inspection, metadata indexing, and incremental refresh are in place; JSONPath, FTS, and cache-backed query remain. |
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |

View File

@@ -32,6 +32,13 @@ from markitect_tool.backend.interfaces import (
QueryAdapter,
SnapshotBackend,
)
from markitect_tool.backend.local_store import (
DEFAULT_LOCAL_INDEX_PATH,
LOCAL_INDEX_SCHEMA_VERSION,
LocalIndexBuildResult,
LocalSnapshotStore,
local_index_path_for,
)
__all__ = [
"BACKEND_CAPABILITIES",
@@ -60,4 +67,9 @@ __all__ = [
"ProcessorResultStore",
"QueryAdapter",
"SnapshotBackend",
"DEFAULT_LOCAL_INDEX_PATH",
"LOCAL_INDEX_SCHEMA_VERSION",
"LocalIndexBuildResult",
"LocalSnapshotStore",
"local_index_path_for",
]

View File

@@ -0,0 +1,510 @@
"""Local SQLite snapshot and metadata store."""
from __future__ import annotations
import json
import sqlite3
from dataclasses import asdict, dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Any
from markitect_tool.backend.engine import (
EMPTY_PARSE_OPTIONS_HASH,
PARSER_ID,
PARSER_VERSION,
DependencyEdge,
ProvenanceEnvelope,
snapshot_identity_for_file,
)
from markitect_tool.backend.planning import SnapshotState, plan_snapshot_refresh
from markitect_tool.cache import scan_markdown_files
from markitect_tool.contract import collect_metrics
from markitect_tool.core import parse_markdown_file
DEFAULT_LOCAL_INDEX_PATH = ".markitect/cache/index.sqlite3"
LOCAL_INDEX_SCHEMA_VERSION = "1"
@dataclass(frozen=True)
class LocalIndexBuildResult:
"""Summary of a local index build or refresh."""
index_path: str
root: str
paths: list[str]
planned: dict[str, Any]
parsed: list[str] = field(default_factory=list)
indexed: list[str] = field(default_factory=list)
metadata_updated: list[str] = field(default_factory=list)
deleted: list[str] = field(default_factory=list)
@property
def dirty(self) -> bool:
return bool(self.parsed or self.indexed or self.metadata_updated or self.deleted)
def to_dict(self) -> dict[str, Any]:
data = asdict(self)
data["dirty"] = self.dirty
data["counts"] = {
"parsed": len(self.parsed),
"indexed": len(self.indexed),
"metadata_updated": len(self.metadata_updated),
"deleted": len(self.deleted),
}
return data
class LocalSnapshotStore:
"""SQLite-backed local snapshot store for parsed Markdown documents."""
def __init__(self, path: str | Path = DEFAULT_LOCAL_INDEX_PATH) -> None:
self.path = Path(path)
def initialize(self) -> None:
"""Create or migrate the local index schema."""
self.path.parent.mkdir(parents=True, exist_ok=True)
with self._connect() as conn:
_create_schema(conn)
conn.execute(
"""
insert into meta(key, value) values('schema_version', ?)
on conflict(key) do update set value = excluded.value
""",
(LOCAL_INDEX_SCHEMA_VERSION,),
)
def load_state(self) -> list[SnapshotState]:
"""Load cheap refresh-planning state without loading document JSON."""
if not self.path.exists():
return []
with self._connect() as conn:
_create_schema(conn)
rows = conn.execute(
"""
select path, size, mtime_ns, content_hash, snapshot_id, parser,
parser_version, parse_options_hash, contract_hash, indexed
from sources
order by path
"""
).fetchall()
dependencies = _load_dependencies(conn)
return [
SnapshotState(
path=row["path"],
size=row["size"],
mtime_ns=row["mtime_ns"],
content_hash=row["content_hash"],
snapshot_id=row["snapshot_id"],
parser=row["parser"],
parser_version=row["parser_version"],
parse_options_hash=row["parse_options_hash"],
contract_hash=row["contract_hash"],
indexed=bool(row["indexed"]),
dependencies=dependencies.get(row["path"], []),
)
for row in rows
]
def put_file(
self,
path: str | Path,
*,
root: str | Path = ".",
parse_options: dict[str, Any] | None = None,
contract_hash: str | None = None,
) -> SnapshotState:
"""Parse and persist one Markdown file."""
self.initialize()
file_path = Path(path)
root_path = Path(root).resolve()
relative_path = _relative(file_path, root_path)
identity = snapshot_identity_for_file(
file_path,
parse_options=parse_options,
contract_hash=contract_hash,
)
document = parse_markdown_file(file_path)
metrics = collect_metrics(document).to_dict()
stat = file_path.stat()
now = datetime.now(timezone.utc).isoformat()
provenance = ProvenanceEnvelope(
operation="local_snapshot_store.put_file",
snapshot_id=identity.snapshot_id,
source_path=relative_path,
content_hash=identity.content_hash,
backend_id="local-sqlite",
)
with self._connect() as conn:
_create_schema(conn)
conn.execute(
"""
insert into sources(
path, abs_path, size, mtime_ns, content_hash, snapshot_id,
parser, parser_version, parse_options_hash, contract_hash,
indexed, document_json, frontmatter_json, metrics_json,
provenance_json, updated_at
) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 1, ?, ?, ?, ?, ?)
on conflict(path) do update set
abs_path = excluded.abs_path,
size = excluded.size,
mtime_ns = excluded.mtime_ns,
content_hash = excluded.content_hash,
snapshot_id = excluded.snapshot_id,
parser = excluded.parser,
parser_version = excluded.parser_version,
parse_options_hash = excluded.parse_options_hash,
contract_hash = excluded.contract_hash,
indexed = excluded.indexed,
document_json = excluded.document_json,
frontmatter_json = excluded.frontmatter_json,
metrics_json = excluded.metrics_json,
provenance_json = excluded.provenance_json,
updated_at = excluded.updated_at
""",
(
relative_path,
str(file_path.resolve()),
stat.st_size,
stat.st_mtime_ns,
identity.content_hash,
identity.snapshot_id,
identity.parser,
identity.parser_version,
identity.parse_options_hash,
identity.contract_hash,
_json(document.to_dict()),
_json(document.frontmatter),
_json(metrics),
_json(provenance.to_dict()),
now,
),
)
_replace_document_units(conn, relative_path, identity.snapshot_id, document.to_dict())
return SnapshotState(
path=relative_path,
size=stat.st_size,
mtime_ns=stat.st_mtime_ns,
content_hash=identity.content_hash,
snapshot_id=identity.snapshot_id,
parser=identity.parser,
parser_version=identity.parser_version,
parse_options_hash=identity.parse_options_hash,
contract_hash=identity.contract_hash,
indexed=True,
)
def update_metadata(self, path: str, *, root: str | Path = ".") -> None:
"""Update file size and mtime when content hash is unchanged."""
file_path = Path(root) / path
stat = file_path.stat()
with self._connect() as conn:
_create_schema(conn)
conn.execute(
"update sources set size = ?, mtime_ns = ?, updated_at = ? where path = ?",
(stat.st_size, stat.st_mtime_ns, datetime.now(timezone.utc).isoformat(), path),
)
def delete_path(self, path: str) -> None:
"""Delete one indexed source and derived rows."""
if not self.path.exists():
return
with self._connect() as conn:
_create_schema(conn)
conn.execute("delete from blocks where path = ?", (path,))
conn.execute("delete from sections where path = ?", (path,))
conn.execute("delete from headings where path = ?", (path,))
conn.execute("delete from dependencies where path = ?", (path,))
conn.execute("delete from sources where path = ?", (path,))
def get_document(self, path: str) -> dict[str, Any]:
"""Return stored document JSON for a relative source path."""
with self._connect() as conn:
_create_schema(conn)
row = conn.execute(
"select document_json from sources where path = ?",
(path,),
).fetchone()
if row is None:
raise KeyError(f"No indexed document `{path}`")
return json.loads(row["document_json"])
def build(
self,
paths: list[str | Path],
*,
root: str | Path = ".",
recursive: bool = True,
parse_options: dict[str, Any] | None = None,
contract_hash: str | None = None,
verify_hashes: bool = True,
) -> LocalIndexBuildResult:
"""Incrementally build or refresh the local index."""
self.initialize()
root_path = Path(root).resolve()
plan = plan_snapshot_refresh(
paths,
previous=self.load_state(),
root=root_path,
recursive=recursive,
parse_options=parse_options,
contract_hash=contract_hash,
verify_hashes=verify_hashes,
)
current_files = {
_relative(path, root_path): path
for path in scan_markdown_files(paths, recursive=recursive)
}
parsed: list[str] = []
indexed: list[str] = []
metadata_updated: list[str] = []
deleted: list[str] = []
for entry in plan.entries:
if "delete" in entry.actions:
self.delete_path(entry.path)
deleted.append(entry.path)
continue
if "parse" in entry.actions or "index" in entry.actions:
file_path = current_files.get(entry.path)
if file_path is None:
continue
self.put_file(
file_path,
root=root_path,
parse_options=parse_options,
contract_hash=contract_hash,
)
if "parse" in entry.actions:
parsed.append(entry.path)
if "index" in entry.actions:
indexed.append(entry.path)
continue
if "metadata" in entry.actions:
self.update_metadata(entry.path, root=root_path)
metadata_updated.append(entry.path)
return LocalIndexBuildResult(
index_path=str(self.path),
root=str(root_path),
paths=[str(path) for path in paths],
planned=plan.to_dict(),
parsed=parsed,
indexed=indexed,
metadata_updated=metadata_updated,
deleted=deleted,
)
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(self.path)
conn.row_factory = sqlite3.Row
conn.execute("pragma foreign_keys = on")
return conn
def local_index_path_for(root: str | Path, index_path: str | Path | None = None) -> Path:
"""Return the local SQLite index path for a root and optional override."""
path = Path(index_path or DEFAULT_LOCAL_INDEX_PATH)
if path.is_absolute():
return path
return Path(root) / path
def _create_schema(conn: sqlite3.Connection) -> None:
conn.executescript(
"""
create table if not exists meta(
key text primary key,
value text not null
);
create table if not exists sources(
path text primary key,
abs_path text not null,
size integer not null,
mtime_ns integer not null,
content_hash text not null,
snapshot_id text not null unique,
parser text not null,
parser_version text not null,
parse_options_hash text not null,
contract_hash text,
indexed integer not null default 1,
document_json text not null,
frontmatter_json text not null,
metrics_json text not null,
provenance_json text not null,
updated_at text not null
);
create table if not exists headings(
snapshot_id text not null,
path text not null,
idx integer not null,
level integer not null,
text text not null,
line integer not null,
primary key(snapshot_id, idx)
);
create table if not exists sections(
snapshot_id text not null,
path text not null,
idx integer not null,
heading_text text not null,
heading_level integer not null,
line integer not null,
text text not null,
line_start integer,
line_end integer,
primary key(snapshot_id, idx)
);
create table if not exists blocks(
snapshot_id text not null,
path text not null,
idx integer not null,
type text not null,
text text not null,
line_start integer,
line_end integer,
heading_level integer,
primary key(snapshot_id, idx)
);
create table if not exists dependencies(
path text not null,
source_id text not null,
target text not null,
kind text not null,
target_snapshot_id text,
metadata_json text not null default '{}'
);
create index if not exists idx_sources_content_hash on sources(content_hash);
create index if not exists idx_sources_snapshot_id on sources(snapshot_id);
create index if not exists idx_sources_parser on sources(parser, parser_version);
create index if not exists idx_headings_path on headings(path);
create index if not exists idx_sections_path on sections(path);
create index if not exists idx_blocks_path on blocks(path);
create index if not exists idx_dependencies_target on dependencies(target);
"""
)
def _replace_document_units(
conn: sqlite3.Connection,
path: str,
snapshot_id: str,
document: dict[str, Any],
) -> None:
conn.execute("delete from blocks where path = ?", (path,))
conn.execute("delete from sections where path = ?", (path,))
conn.execute("delete from headings where path = ?", (path,))
for idx, heading in enumerate(document.get("headings", [])):
conn.execute(
"""
insert into headings(snapshot_id, path, idx, level, text, line)
values (?, ?, ?, ?, ?, ?)
""",
(
snapshot_id,
path,
idx,
int(heading["level"]),
str(heading["text"]),
int(heading["line"]),
),
)
for idx, section in enumerate(document.get("sections", [])):
heading = section["heading"]
text = "\n\n".join(str(block.get("text", "")) for block in section.get("blocks", []))
line_start = _first_present(block.get("line_start") for block in section.get("blocks", []))
line_end = _last_present(block.get("line_end") for block in section.get("blocks", []))
conn.execute(
"""
insert into sections(
snapshot_id, path, idx, heading_text, heading_level, line,
text, line_start, line_end
) values (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
snapshot_id,
path,
idx,
str(heading["text"]),
int(heading["level"]),
int(heading["line"]),
text,
line_start,
line_end,
),
)
for idx, block in enumerate(document.get("blocks", [])):
conn.execute(
"""
insert into blocks(
snapshot_id, path, idx, type, text, line_start, line_end, heading_level
) values (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
snapshot_id,
path,
idx,
str(block["type"]),
str(block.get("text", "")),
block.get("line_start"),
block.get("line_end"),
block.get("heading_level"),
),
)
def _load_dependencies(conn: sqlite3.Connection) -> dict[str, list[DependencyEdge]]:
rows = conn.execute(
"""
select path, source_id, target, kind, target_snapshot_id, metadata_json
from dependencies
order by path, source_id, target
"""
).fetchall()
dependencies: dict[str, list[DependencyEdge]] = {}
for row in rows:
dependencies.setdefault(row["path"], []).append(
DependencyEdge(
source_id=row["source_id"],
target=row["target"],
kind=row["kind"],
target_snapshot_id=row["target_snapshot_id"],
metadata=json.loads(row["metadata_json"] or "{}"),
)
)
return dependencies
def _relative(path: Path, root: Path) -> str:
resolved = path.resolve()
try:
return resolved.relative_to(root).as_posix()
except ValueError:
return resolved.as_posix()
def _json(data: Any) -> str:
return json.dumps(data, sort_keys=True, ensure_ascii=False)
def _first_present(values: Any) -> int | None:
for value in values:
if value is not None:
return int(value)
return None
def _last_present(values: Any) -> int | None:
found: int | None = None
for value in values:
if value is not None:
found = int(value)
return found

View File

@@ -18,8 +18,10 @@ from markitect_tool.cache import (
)
from markitect_tool.backend import (
BackendRegistryError,
LocalSnapshotStore,
load_backend_registry,
load_snapshot_state_file,
local_index_path_for,
plan_snapshot_refresh,
snapshot_identity_for_file,
)
@@ -95,6 +97,51 @@ def parse(file: Path, output_format: str) -> None:
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
@main.group()
def ast() -> None:
"""Inspect parsed Markdown ASTs and parser summaries."""
@ast.command("show")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "tree"], case_sensitive=False),
default="json",
show_default=True,
)
def ast_show(file: Path, output_format: str) -> None:
"""Show a parsed Markdown AST without requiring a cache."""
document = parse_markdown_file(file)
data = document.to_dict()
if output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
elif output_format == "tree":
for heading in document.headings:
click.echo(f"{'#' * heading.level} {heading.text}")
else:
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
@ast.command("stats")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def ast_stats(file: Path, output_format: str) -> None:
"""Summarize parsed Markdown AST shape and token distribution."""
document = parse_markdown_file(file)
data = _ast_stats(document.to_dict(), str(file))
_emit_ast_stats(data, output_format)
@main.command()
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
@@ -726,6 +773,40 @@ def cache() -> None:
"""Fingerprint Markdown files and detect changed inputs."""
@cache.command("init")
@click.option(
"--root",
type=click.Path(exists=True, file_okay=False, path_type=Path),
default=Path("."),
show_default=True,
help="Root used for the default local index path.",
)
@click.option(
"--index-path",
type=click.Path(dir_okay=False, path_type=Path),
help="SQLite index path. Defaults to .markitect/cache/index.sqlite3 under root.",
)
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def cache_init(root: Path, index_path: Path | None, output_format: str) -> None:
"""Initialize the local SQLite snapshot/index store."""
resolved_index = local_index_path_for(root, index_path)
store = LocalSnapshotStore(resolved_index)
store.initialize()
data = {
"index_path": str(resolved_index),
"schema_version": "1",
"sources": len(store.load_state()),
}
_emit_local_index_data(data, output_format)
@cache.command("fingerprint")
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
@click.option(
@@ -833,6 +914,68 @@ def cache_status(
raise click.exceptions.Exit(1 if status.dirty else 0)
@cache.command("index")
@click.argument("paths", nargs=-1, required=True, type=click.Path(exists=True, path_type=Path))
@click.option(
"--root",
type=click.Path(exists=True, file_okay=False, path_type=Path),
default=Path("."),
show_default=True,
help="Root used for relative index paths.",
)
@click.option(
"--index-path",
type=click.Path(dir_okay=False, path_type=Path),
help="SQLite index path. Defaults to .markitect/cache/index.sqlite3 under root.",
)
@click.option("--no-recursive", is_flag=True, help="Do not recurse into directories.")
@click.option(
"--no-verify-hashes",
is_flag=True,
help="Do not hash metadata-changed files before parsing.",
)
@click.option(
"--parse-option",
"parse_options",
multiple=True,
metavar="KEY=VALUE",
help="Parse option included in the snapshot identity hash.",
)
@click.option("--contract-hash", help="Optional contract hash included in snapshot identity.")
@click.option(
"--format",
"output_format",
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
default="text",
show_default=True,
)
def cache_index(
paths: tuple[Path, ...],
root: Path,
index_path: Path | None,
no_recursive: bool,
no_verify_hashes: bool,
parse_options: tuple[str, ...],
contract_hash: str | None,
output_format: str,
) -> None:
"""Build or refresh the local SQLite snapshot/index store."""
try:
store = LocalSnapshotStore(local_index_path_for(root, index_path))
result = store.build(
list(paths),
root=root,
recursive=not no_recursive,
parse_options=_parse_key_value_options(parse_options),
contract_hash=contract_hash,
verify_hashes=not no_verify_hashes,
)
except ValueError as exc:
raise click.ClickException(str(exc)) from exc
_emit_local_index_data(result.to_dict(), output_format)
@main.group()
def template() -> None:
"""Render and inspect deterministic Markdown templates."""
@@ -1213,6 +1356,42 @@ def _emit_cache_data(data: dict, output_format: str) -> None:
click.echo(f"written: {data['written']}")
def _emit_ast_stats(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo(f"document_path: {data['document_path']}")
for key, value in data["counts"].items():
click.echo(f"{key}: {value}")
click.echo(f"max_heading_depth: {data['max_heading_depth']}")
if data["token_types"]:
click.echo("token_types:")
for token_type, count in data["token_types"].items():
click.echo(f"- {token_type}: {count}")
def _emit_local_index_data(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
elif output_format == "yaml":
click.echo(yaml.safe_dump(data, sort_keys=False))
else:
click.echo(f"index_path: {data['index_path']}")
if data.get("schema_version"):
click.echo(f"schema_version: {data['schema_version']}")
if data.get("sources") is not None:
click.echo(f"sources: {data['sources']}")
if data.get("dirty") is not None:
click.echo("dirty" if data["dirty"] else "clean")
for key in ["parsed", "indexed", "metadata_updated", "deleted"]:
values = data.get(key, [])
click.echo(f"{key}: {len(values)}")
for value in values:
click.echo(f"- {value}")
def _emit_reference_result(data: dict, output_format: str) -> None:
if output_format == "json":
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
@@ -1404,6 +1583,29 @@ def _set_path(mapping: dict[str, object], path: list[str], value: object) -> Non
current[path[-1]] = value
def _ast_stats(document: dict, document_path: str) -> dict:
token_types: dict[str, int] = {}
for token in document.get("tokens", []):
token_type = str(token.get("type", "unknown"))
token_types[token_type] = token_types.get(token_type, 0) + 1
headings = document.get("headings", [])
return {
"document_path": document_path,
"source_path": document.get("source_path"),
"counts": {
"frontmatter_keys": len(document.get("frontmatter", {})),
"headings": len(headings),
"sections": len(document.get("sections", [])),
"blocks": len(document.get("blocks", [])),
"tokens": len(document.get("tokens", [])),
},
"max_heading_depth": max(
[int(heading.get("level", 0)) for heading in headings] or [0]
),
"token_types": dict(sorted(token_types.items())),
}
def _load_template_data(data_file: Path | None) -> dict[str, object]:
if data_file is None:
return {}

View File

@@ -0,0 +1,89 @@
from pathlib import Path
from click.testing import CliRunner
from markitect_tool.backend import LocalSnapshotStore, local_index_path_for
from markitect_tool.cli import main
def test_local_snapshot_store_persists_state_and_document(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text("---\ntitle: Example\n---\n# Doc\n\nBody.\n", encoding="utf-8")
store = LocalSnapshotStore(tmp_path / ".markitect" / "cache" / "index.sqlite3")
state = store.put_file(source, root=tmp_path)
loaded = store.load_state()
document = store.get_document("doc.md")
assert state.path == "doc.md"
assert state.snapshot_id.startswith("snapshot:")
assert loaded[0].path == "doc.md"
assert loaded[0].content_hash == state.content_hash
assert document["frontmatter"]["title"] == "Example"
assert document["headings"][0]["text"] == "Doc"
def test_local_snapshot_store_build_is_incremental(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text("# Doc\n", encoding="utf-8")
store = LocalSnapshotStore(local_index_path_for(tmp_path))
first = store.build([tmp_path], root=tmp_path)
second = store.build([tmp_path], root=tmp_path)
assert first.parsed == ["doc.md"]
assert first.indexed == ["doc.md"]
assert second.parsed == []
assert second.indexed == []
assert not second.dirty
source.write_text("# Doc\n\nChanged.\n", encoding="utf-8")
changed = store.build([tmp_path], root=tmp_path)
assert changed.parsed == ["doc.md"]
assert changed.indexed == ["doc.md"]
def test_local_snapshot_store_deletes_removed_files(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text("# Doc\n", encoding="utf-8")
store = LocalSnapshotStore(local_index_path_for(tmp_path))
store.build([tmp_path], root=tmp_path)
source.unlink()
result = store.build([tmp_path], root=tmp_path)
assert result.deleted == ["doc.md"]
assert store.load_state() == []
def test_mkt_ast_show_and_stats(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text("# Doc\n\nBody.\n", encoding="utf-8")
runner = CliRunner()
shown = runner.invoke(main, ["ast", "show", str(source), "--format", "tree"])
stats = runner.invoke(main, ["ast", "stats", str(source)])
assert shown.exit_code == 0
assert "# Doc" in shown.output
assert stats.exit_code == 0
assert "headings: 1" in stats.output
assert "paragraph_open" in stats.output
def test_mkt_cache_init_and_index(tmp_path: Path):
source = tmp_path / "doc.md"
source.write_text("# Doc\n", encoding="utf-8")
runner = CliRunner()
initialized = runner.invoke(main, ["cache", "init", "--root", str(tmp_path)])
indexed = runner.invoke(main, ["cache", "index", str(tmp_path), "--root", str(tmp_path)])
clean = runner.invoke(main, ["cache", "index", str(tmp_path), "--root", str(tmp_path)])
assert initialized.exit_code == 0
assert "schema_version: 1" in initialized.output
assert indexed.exit_code == 0
assert "parsed: 1" in indexed.output
assert clean.exit_code == 0
assert "clean" in clean.output

View File

@@ -51,7 +51,7 @@ directly and should report actual refresh work against the same categories.
```task
id: MKTT-WP-0007-T001
status: todo
status: done
priority: high
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
```
@@ -59,6 +59,10 @@ state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
Persist parsed document snapshots and source metadata in a local cache
directory.
Implemented: `LocalSnapshotStore`, SQLite schema initialization, source-state
loading, parsed document JSON persistence, provenance envelope storage, and
relative path handling. See `docs/local-index-backend.md`.
Implementation hints:
- Persist `SnapshotState` fields in the snapshot/source tables.
@@ -71,7 +75,7 @@ Implementation hints:
```task
id: MKTT-WP-0007-T002
status: todo
status: done
priority: high
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
```
@@ -86,6 +90,9 @@ mkt ast stats <file>
Use the current parsed document and token model. Do not require cache presence
for single-file use.
Implemented: `mkt ast show <file>` and `mkt ast stats <file>` with JSON, YAML,
tree/text output modes.
## P7.3 - Add optional JSONPath query adapter
```task
@@ -102,7 +109,7 @@ shared query result envelope.
```task
id: MKTT-WP-0007-T004
status: todo
status: done
priority: medium
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
```
@@ -121,6 +128,11 @@ Implementation hints:
- Preserve source spans and content-unit ids from WP-0010 reference/literate
layers.
Implemented: source, heading, section, block, dependency, and metadata tables;
document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on
path, content hash, snapshot id, parser identity, unit path, and dependency
target.
## P7.5 - Add FTS5 section/block search
```task
@@ -137,7 +149,7 @@ relevance ranking.
```task
id: MKTT-WP-0007-T006
status: todo
status: done
priority: medium
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
```
@@ -156,6 +168,11 @@ Implementation hints:
- Report planned vs actual counts for hash, parse, index, metadata update,
delete, and invalidation work.
Implemented first pass: `LocalSnapshotStore.build()` drives refresh from
`SnapshotRefreshPlan`, hashes metadata-changed files by default, skips
unchanged content, updates metadata-only rows, refreshes changed snapshots, and
deletes removed files.
## P7.7 - Add local index CLI
```task
@@ -174,6 +191,10 @@ mkt cache query <selector-or-query>
mkt search <text>
```
Partial implementation: `mkt cache init` initializes the local SQLite store and
`mkt cache index <path>` builds or refreshes it. Cache-backed query and FTS
search remain part of this task.
## Exit Criteria
- Legacy AST/JSONPath value is recovered as an optional backend.