generated from coulomb/repo-seed
SQLite-backed local snapshot store
This commit is contained in:
87
docs/local-index-backend.md
Normal file
87
docs/local-index-backend.md
Normal file
@@ -0,0 +1,87 @@
|
|||||||
|
# Local Index Backend
|
||||||
|
|
||||||
|
`markitect-tool` now includes a local SQLite snapshot/index backend as the
|
||||||
|
first practical implementation of the optional backend fabric.
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The local index is optimized for repeatable Markdown infrastructure work:
|
||||||
|
|
||||||
|
- persist parsed document snapshots
|
||||||
|
- keep cheap source metadata for incremental refresh planning
|
||||||
|
- store document JSON for later AST/JSONPath use
|
||||||
|
- index frontmatter, headings, sections, blocks, and metrics
|
||||||
|
- preserve extension points for dependency edges, references, named regions,
|
||||||
|
chunks, processor outputs, FTS, and policy-aware access
|
||||||
|
|
||||||
|
The backend is optional. Single-file commands such as `mkt parse`, `mkt query`,
|
||||||
|
and `mkt ast` do not require it.
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
Initialize the SQLite store:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt cache init --root .
|
||||||
|
```
|
||||||
|
|
||||||
|
Build or refresh the local index:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt cache index docs workplans --root .
|
||||||
|
```
|
||||||
|
|
||||||
|
Inspect a parsed AST without using the cache:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt ast show docs/backend-fabric.md --format tree
|
||||||
|
mkt ast stats docs/backend-fabric.md
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, the index is written to:
|
||||||
|
|
||||||
|
```text
|
||||||
|
.markitect/cache/index.sqlite3
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `--index-path` to override it.
|
||||||
|
|
||||||
|
## Refresh Behavior
|
||||||
|
|
||||||
|
`mkt cache index` uses the same cheap-first refresh planning model as
|
||||||
|
`mkt backend refresh-plan`:
|
||||||
|
|
||||||
|
1. Compare path, size, mtime, parser identity, parse options, and contract hash.
|
||||||
|
2. Hash only files whose metadata changed.
|
||||||
|
3. Skip parse/index when metadata changed but content hash stayed the same.
|
||||||
|
4. Parse and index new or changed files.
|
||||||
|
5. Delete rows for removed source files.
|
||||||
|
|
||||||
|
The command reports planned work and actual work separately in JSON/YAML output.
|
||||||
|
|
||||||
|
## Stored Data
|
||||||
|
|
||||||
|
The first schema stores:
|
||||||
|
|
||||||
|
- `sources`: path, absolute path, size, mtime, content hash, snapshot id,
|
||||||
|
parser identity, parse option hash, contract hash, document JSON,
|
||||||
|
frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
|
||||||
|
- `headings`: heading level, text, and source line
|
||||||
|
- `sections`: heading metadata, section text, and source span
|
||||||
|
- `blocks`: block type, text, source span, and heading level
|
||||||
|
- `dependencies`: reserved dependency edge table for references,
|
||||||
|
transclusion, literate chunks, and future invalidation graphs
|
||||||
|
|
||||||
|
This is enough to recover the useful markitect-main idea of keeping parsed
|
||||||
|
structure available for faster and richer query backends, while keeping the
|
||||||
|
normal CLI usable without a cache.
|
||||||
|
|
||||||
|
## Future Work
|
||||||
|
|
||||||
|
`MKTT-WP-0007` still needs:
|
||||||
|
|
||||||
|
- JSONPath query adapter over stored or live document JSON
|
||||||
|
- FTS5 search over section/block rows
|
||||||
|
- cache-backed query commands
|
||||||
|
- richer dependency extraction from references, transclusion, and literate
|
||||||
|
chunks
|
||||||
@@ -33,7 +33,7 @@ and descriptions mirror the operational view.
|
|||||||
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
|
| `MKTT-WP-0003` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Core toolkit implementation is complete. |
|
||||||
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
|
| `MKTT-WP-0006` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Optional backend fabric is complete: manifests, capabilities, snapshot identity, interfaces, registry, provenance, and read-only CLI scaffolding. |
|
||||||
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
|
| `MKTT-WP-0010` | complete | done | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T006` | Content references, processors, explode/implode, weave/tangle, content classes, and migration examples are complete as the first WP-0010 extension layer. |
|
||||||
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. Preliminary refresh planning is in place as the performance contract. |
|
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. SQLite snapshots, AST inspection, metadata indexing, and incremental refresh are in place; JSONPath, FTS, and cache-backed query remain. |
|
||||||
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||||
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
| `MKTT-WP-0011` | P2 | todo | `MKTT-WP-0003`; task-level triggers: `MKTT-WP-0010-T001`, `MKTT-WP-0010-T005` | Declarative Markdown dataflow workflows: source extraction, deterministic/assisted processing, and multi-output generation. |
|
||||||
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
||||||
|
|||||||
@@ -32,6 +32,13 @@ from markitect_tool.backend.interfaces import (
|
|||||||
QueryAdapter,
|
QueryAdapter,
|
||||||
SnapshotBackend,
|
SnapshotBackend,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.backend.local_store import (
|
||||||
|
DEFAULT_LOCAL_INDEX_PATH,
|
||||||
|
LOCAL_INDEX_SCHEMA_VERSION,
|
||||||
|
LocalIndexBuildResult,
|
||||||
|
LocalSnapshotStore,
|
||||||
|
local_index_path_for,
|
||||||
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"BACKEND_CAPABILITIES",
|
"BACKEND_CAPABILITIES",
|
||||||
@@ -60,4 +67,9 @@ __all__ = [
|
|||||||
"ProcessorResultStore",
|
"ProcessorResultStore",
|
||||||
"QueryAdapter",
|
"QueryAdapter",
|
||||||
"SnapshotBackend",
|
"SnapshotBackend",
|
||||||
|
"DEFAULT_LOCAL_INDEX_PATH",
|
||||||
|
"LOCAL_INDEX_SCHEMA_VERSION",
|
||||||
|
"LocalIndexBuildResult",
|
||||||
|
"LocalSnapshotStore",
|
||||||
|
"local_index_path_for",
|
||||||
]
|
]
|
||||||
|
|||||||
510
src/markitect_tool/backend/local_store.py
Normal file
510
src/markitect_tool/backend/local_store.py
Normal file
@@ -0,0 +1,510 @@
|
|||||||
|
"""Local SQLite snapshot and metadata store."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from markitect_tool.backend.engine import (
|
||||||
|
EMPTY_PARSE_OPTIONS_HASH,
|
||||||
|
PARSER_ID,
|
||||||
|
PARSER_VERSION,
|
||||||
|
DependencyEdge,
|
||||||
|
ProvenanceEnvelope,
|
||||||
|
snapshot_identity_for_file,
|
||||||
|
)
|
||||||
|
from markitect_tool.backend.planning import SnapshotState, plan_snapshot_refresh
|
||||||
|
from markitect_tool.cache import scan_markdown_files
|
||||||
|
from markitect_tool.contract import collect_metrics
|
||||||
|
from markitect_tool.core import parse_markdown_file
|
||||||
|
|
||||||
|
|
||||||
|
DEFAULT_LOCAL_INDEX_PATH = ".markitect/cache/index.sqlite3"
|
||||||
|
LOCAL_INDEX_SCHEMA_VERSION = "1"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class LocalIndexBuildResult:
|
||||||
|
"""Summary of a local index build or refresh."""
|
||||||
|
|
||||||
|
index_path: str
|
||||||
|
root: str
|
||||||
|
paths: list[str]
|
||||||
|
planned: dict[str, Any]
|
||||||
|
parsed: list[str] = field(default_factory=list)
|
||||||
|
indexed: list[str] = field(default_factory=list)
|
||||||
|
metadata_updated: list[str] = field(default_factory=list)
|
||||||
|
deleted: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dirty(self) -> bool:
|
||||||
|
return bool(self.parsed or self.indexed or self.metadata_updated or self.deleted)
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = asdict(self)
|
||||||
|
data["dirty"] = self.dirty
|
||||||
|
data["counts"] = {
|
||||||
|
"parsed": len(self.parsed),
|
||||||
|
"indexed": len(self.indexed),
|
||||||
|
"metadata_updated": len(self.metadata_updated),
|
||||||
|
"deleted": len(self.deleted),
|
||||||
|
}
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
class LocalSnapshotStore:
|
||||||
|
"""SQLite-backed local snapshot store for parsed Markdown documents."""
|
||||||
|
|
||||||
|
def __init__(self, path: str | Path = DEFAULT_LOCAL_INDEX_PATH) -> None:
|
||||||
|
self.path = Path(path)
|
||||||
|
|
||||||
|
def initialize(self) -> None:
|
||||||
|
"""Create or migrate the local index schema."""
|
||||||
|
|
||||||
|
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
insert into meta(key, value) values('schema_version', ?)
|
||||||
|
on conflict(key) do update set value = excluded.value
|
||||||
|
""",
|
||||||
|
(LOCAL_INDEX_SCHEMA_VERSION,),
|
||||||
|
)
|
||||||
|
|
||||||
|
def load_state(self) -> list[SnapshotState]:
|
||||||
|
"""Load cheap refresh-planning state without loading document JSON."""
|
||||||
|
|
||||||
|
if not self.path.exists():
|
||||||
|
return []
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
rows = conn.execute(
|
||||||
|
"""
|
||||||
|
select path, size, mtime_ns, content_hash, snapshot_id, parser,
|
||||||
|
parser_version, parse_options_hash, contract_hash, indexed
|
||||||
|
from sources
|
||||||
|
order by path
|
||||||
|
"""
|
||||||
|
).fetchall()
|
||||||
|
dependencies = _load_dependencies(conn)
|
||||||
|
return [
|
||||||
|
SnapshotState(
|
||||||
|
path=row["path"],
|
||||||
|
size=row["size"],
|
||||||
|
mtime_ns=row["mtime_ns"],
|
||||||
|
content_hash=row["content_hash"],
|
||||||
|
snapshot_id=row["snapshot_id"],
|
||||||
|
parser=row["parser"],
|
||||||
|
parser_version=row["parser_version"],
|
||||||
|
parse_options_hash=row["parse_options_hash"],
|
||||||
|
contract_hash=row["contract_hash"],
|
||||||
|
indexed=bool(row["indexed"]),
|
||||||
|
dependencies=dependencies.get(row["path"], []),
|
||||||
|
)
|
||||||
|
for row in rows
|
||||||
|
]
|
||||||
|
|
||||||
|
def put_file(
|
||||||
|
self,
|
||||||
|
path: str | Path,
|
||||||
|
*,
|
||||||
|
root: str | Path = ".",
|
||||||
|
parse_options: dict[str, Any] | None = None,
|
||||||
|
contract_hash: str | None = None,
|
||||||
|
) -> SnapshotState:
|
||||||
|
"""Parse and persist one Markdown file."""
|
||||||
|
|
||||||
|
self.initialize()
|
||||||
|
file_path = Path(path)
|
||||||
|
root_path = Path(root).resolve()
|
||||||
|
relative_path = _relative(file_path, root_path)
|
||||||
|
identity = snapshot_identity_for_file(
|
||||||
|
file_path,
|
||||||
|
parse_options=parse_options,
|
||||||
|
contract_hash=contract_hash,
|
||||||
|
)
|
||||||
|
document = parse_markdown_file(file_path)
|
||||||
|
metrics = collect_metrics(document).to_dict()
|
||||||
|
stat = file_path.stat()
|
||||||
|
now = datetime.now(timezone.utc).isoformat()
|
||||||
|
provenance = ProvenanceEnvelope(
|
||||||
|
operation="local_snapshot_store.put_file",
|
||||||
|
snapshot_id=identity.snapshot_id,
|
||||||
|
source_path=relative_path,
|
||||||
|
content_hash=identity.content_hash,
|
||||||
|
backend_id="local-sqlite",
|
||||||
|
)
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
insert into sources(
|
||||||
|
path, abs_path, size, mtime_ns, content_hash, snapshot_id,
|
||||||
|
parser, parser_version, parse_options_hash, contract_hash,
|
||||||
|
indexed, document_json, frontmatter_json, metrics_json,
|
||||||
|
provenance_json, updated_at
|
||||||
|
) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 1, ?, ?, ?, ?, ?)
|
||||||
|
on conflict(path) do update set
|
||||||
|
abs_path = excluded.abs_path,
|
||||||
|
size = excluded.size,
|
||||||
|
mtime_ns = excluded.mtime_ns,
|
||||||
|
content_hash = excluded.content_hash,
|
||||||
|
snapshot_id = excluded.snapshot_id,
|
||||||
|
parser = excluded.parser,
|
||||||
|
parser_version = excluded.parser_version,
|
||||||
|
parse_options_hash = excluded.parse_options_hash,
|
||||||
|
contract_hash = excluded.contract_hash,
|
||||||
|
indexed = excluded.indexed,
|
||||||
|
document_json = excluded.document_json,
|
||||||
|
frontmatter_json = excluded.frontmatter_json,
|
||||||
|
metrics_json = excluded.metrics_json,
|
||||||
|
provenance_json = excluded.provenance_json,
|
||||||
|
updated_at = excluded.updated_at
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
relative_path,
|
||||||
|
str(file_path.resolve()),
|
||||||
|
stat.st_size,
|
||||||
|
stat.st_mtime_ns,
|
||||||
|
identity.content_hash,
|
||||||
|
identity.snapshot_id,
|
||||||
|
identity.parser,
|
||||||
|
identity.parser_version,
|
||||||
|
identity.parse_options_hash,
|
||||||
|
identity.contract_hash,
|
||||||
|
_json(document.to_dict()),
|
||||||
|
_json(document.frontmatter),
|
||||||
|
_json(metrics),
|
||||||
|
_json(provenance.to_dict()),
|
||||||
|
now,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
_replace_document_units(conn, relative_path, identity.snapshot_id, document.to_dict())
|
||||||
|
return SnapshotState(
|
||||||
|
path=relative_path,
|
||||||
|
size=stat.st_size,
|
||||||
|
mtime_ns=stat.st_mtime_ns,
|
||||||
|
content_hash=identity.content_hash,
|
||||||
|
snapshot_id=identity.snapshot_id,
|
||||||
|
parser=identity.parser,
|
||||||
|
parser_version=identity.parser_version,
|
||||||
|
parse_options_hash=identity.parse_options_hash,
|
||||||
|
contract_hash=identity.contract_hash,
|
||||||
|
indexed=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
def update_metadata(self, path: str, *, root: str | Path = ".") -> None:
|
||||||
|
"""Update file size and mtime when content hash is unchanged."""
|
||||||
|
|
||||||
|
file_path = Path(root) / path
|
||||||
|
stat = file_path.stat()
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
conn.execute(
|
||||||
|
"update sources set size = ?, mtime_ns = ?, updated_at = ? where path = ?",
|
||||||
|
(stat.st_size, stat.st_mtime_ns, datetime.now(timezone.utc).isoformat(), path),
|
||||||
|
)
|
||||||
|
|
||||||
|
def delete_path(self, path: str) -> None:
|
||||||
|
"""Delete one indexed source and derived rows."""
|
||||||
|
|
||||||
|
if not self.path.exists():
|
||||||
|
return
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
conn.execute("delete from blocks where path = ?", (path,))
|
||||||
|
conn.execute("delete from sections where path = ?", (path,))
|
||||||
|
conn.execute("delete from headings where path = ?", (path,))
|
||||||
|
conn.execute("delete from dependencies where path = ?", (path,))
|
||||||
|
conn.execute("delete from sources where path = ?", (path,))
|
||||||
|
|
||||||
|
def get_document(self, path: str) -> dict[str, Any]:
|
||||||
|
"""Return stored document JSON for a relative source path."""
|
||||||
|
|
||||||
|
with self._connect() as conn:
|
||||||
|
_create_schema(conn)
|
||||||
|
row = conn.execute(
|
||||||
|
"select document_json from sources where path = ?",
|
||||||
|
(path,),
|
||||||
|
).fetchone()
|
||||||
|
if row is None:
|
||||||
|
raise KeyError(f"No indexed document `{path}`")
|
||||||
|
return json.loads(row["document_json"])
|
||||||
|
|
||||||
|
def build(
|
||||||
|
self,
|
||||||
|
paths: list[str | Path],
|
||||||
|
*,
|
||||||
|
root: str | Path = ".",
|
||||||
|
recursive: bool = True,
|
||||||
|
parse_options: dict[str, Any] | None = None,
|
||||||
|
contract_hash: str | None = None,
|
||||||
|
verify_hashes: bool = True,
|
||||||
|
) -> LocalIndexBuildResult:
|
||||||
|
"""Incrementally build or refresh the local index."""
|
||||||
|
|
||||||
|
self.initialize()
|
||||||
|
root_path = Path(root).resolve()
|
||||||
|
plan = plan_snapshot_refresh(
|
||||||
|
paths,
|
||||||
|
previous=self.load_state(),
|
||||||
|
root=root_path,
|
||||||
|
recursive=recursive,
|
||||||
|
parse_options=parse_options,
|
||||||
|
contract_hash=contract_hash,
|
||||||
|
verify_hashes=verify_hashes,
|
||||||
|
)
|
||||||
|
current_files = {
|
||||||
|
_relative(path, root_path): path
|
||||||
|
for path in scan_markdown_files(paths, recursive=recursive)
|
||||||
|
}
|
||||||
|
parsed: list[str] = []
|
||||||
|
indexed: list[str] = []
|
||||||
|
metadata_updated: list[str] = []
|
||||||
|
deleted: list[str] = []
|
||||||
|
for entry in plan.entries:
|
||||||
|
if "delete" in entry.actions:
|
||||||
|
self.delete_path(entry.path)
|
||||||
|
deleted.append(entry.path)
|
||||||
|
continue
|
||||||
|
if "parse" in entry.actions or "index" in entry.actions:
|
||||||
|
file_path = current_files.get(entry.path)
|
||||||
|
if file_path is None:
|
||||||
|
continue
|
||||||
|
self.put_file(
|
||||||
|
file_path,
|
||||||
|
root=root_path,
|
||||||
|
parse_options=parse_options,
|
||||||
|
contract_hash=contract_hash,
|
||||||
|
)
|
||||||
|
if "parse" in entry.actions:
|
||||||
|
parsed.append(entry.path)
|
||||||
|
if "index" in entry.actions:
|
||||||
|
indexed.append(entry.path)
|
||||||
|
continue
|
||||||
|
if "metadata" in entry.actions:
|
||||||
|
self.update_metadata(entry.path, root=root_path)
|
||||||
|
metadata_updated.append(entry.path)
|
||||||
|
return LocalIndexBuildResult(
|
||||||
|
index_path=str(self.path),
|
||||||
|
root=str(root_path),
|
||||||
|
paths=[str(path) for path in paths],
|
||||||
|
planned=plan.to_dict(),
|
||||||
|
parsed=parsed,
|
||||||
|
indexed=indexed,
|
||||||
|
metadata_updated=metadata_updated,
|
||||||
|
deleted=deleted,
|
||||||
|
)
|
||||||
|
|
||||||
|
def _connect(self) -> sqlite3.Connection:
|
||||||
|
conn = sqlite3.connect(self.path)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
conn.execute("pragma foreign_keys = on")
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def local_index_path_for(root: str | Path, index_path: str | Path | None = None) -> Path:
|
||||||
|
"""Return the local SQLite index path for a root and optional override."""
|
||||||
|
|
||||||
|
path = Path(index_path or DEFAULT_LOCAL_INDEX_PATH)
|
||||||
|
if path.is_absolute():
|
||||||
|
return path
|
||||||
|
return Path(root) / path
|
||||||
|
|
||||||
|
|
||||||
|
def _create_schema(conn: sqlite3.Connection) -> None:
|
||||||
|
conn.executescript(
|
||||||
|
"""
|
||||||
|
create table if not exists meta(
|
||||||
|
key text primary key,
|
||||||
|
value text not null
|
||||||
|
);
|
||||||
|
create table if not exists sources(
|
||||||
|
path text primary key,
|
||||||
|
abs_path text not null,
|
||||||
|
size integer not null,
|
||||||
|
mtime_ns integer not null,
|
||||||
|
content_hash text not null,
|
||||||
|
snapshot_id text not null unique,
|
||||||
|
parser text not null,
|
||||||
|
parser_version text not null,
|
||||||
|
parse_options_hash text not null,
|
||||||
|
contract_hash text,
|
||||||
|
indexed integer not null default 1,
|
||||||
|
document_json text not null,
|
||||||
|
frontmatter_json text not null,
|
||||||
|
metrics_json text not null,
|
||||||
|
provenance_json text not null,
|
||||||
|
updated_at text not null
|
||||||
|
);
|
||||||
|
create table if not exists headings(
|
||||||
|
snapshot_id text not null,
|
||||||
|
path text not null,
|
||||||
|
idx integer not null,
|
||||||
|
level integer not null,
|
||||||
|
text text not null,
|
||||||
|
line integer not null,
|
||||||
|
primary key(snapshot_id, idx)
|
||||||
|
);
|
||||||
|
create table if not exists sections(
|
||||||
|
snapshot_id text not null,
|
||||||
|
path text not null,
|
||||||
|
idx integer not null,
|
||||||
|
heading_text text not null,
|
||||||
|
heading_level integer not null,
|
||||||
|
line integer not null,
|
||||||
|
text text not null,
|
||||||
|
line_start integer,
|
||||||
|
line_end integer,
|
||||||
|
primary key(snapshot_id, idx)
|
||||||
|
);
|
||||||
|
create table if not exists blocks(
|
||||||
|
snapshot_id text not null,
|
||||||
|
path text not null,
|
||||||
|
idx integer not null,
|
||||||
|
type text not null,
|
||||||
|
text text not null,
|
||||||
|
line_start integer,
|
||||||
|
line_end integer,
|
||||||
|
heading_level integer,
|
||||||
|
primary key(snapshot_id, idx)
|
||||||
|
);
|
||||||
|
create table if not exists dependencies(
|
||||||
|
path text not null,
|
||||||
|
source_id text not null,
|
||||||
|
target text not null,
|
||||||
|
kind text not null,
|
||||||
|
target_snapshot_id text,
|
||||||
|
metadata_json text not null default '{}'
|
||||||
|
);
|
||||||
|
create index if not exists idx_sources_content_hash on sources(content_hash);
|
||||||
|
create index if not exists idx_sources_snapshot_id on sources(snapshot_id);
|
||||||
|
create index if not exists idx_sources_parser on sources(parser, parser_version);
|
||||||
|
create index if not exists idx_headings_path on headings(path);
|
||||||
|
create index if not exists idx_sections_path on sections(path);
|
||||||
|
create index if not exists idx_blocks_path on blocks(path);
|
||||||
|
create index if not exists idx_dependencies_target on dependencies(target);
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _replace_document_units(
|
||||||
|
conn: sqlite3.Connection,
|
||||||
|
path: str,
|
||||||
|
snapshot_id: str,
|
||||||
|
document: dict[str, Any],
|
||||||
|
) -> None:
|
||||||
|
conn.execute("delete from blocks where path = ?", (path,))
|
||||||
|
conn.execute("delete from sections where path = ?", (path,))
|
||||||
|
conn.execute("delete from headings where path = ?", (path,))
|
||||||
|
for idx, heading in enumerate(document.get("headings", [])):
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
insert into headings(snapshot_id, path, idx, level, text, line)
|
||||||
|
values (?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
snapshot_id,
|
||||||
|
path,
|
||||||
|
idx,
|
||||||
|
int(heading["level"]),
|
||||||
|
str(heading["text"]),
|
||||||
|
int(heading["line"]),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
for idx, section in enumerate(document.get("sections", [])):
|
||||||
|
heading = section["heading"]
|
||||||
|
text = "\n\n".join(str(block.get("text", "")) for block in section.get("blocks", []))
|
||||||
|
line_start = _first_present(block.get("line_start") for block in section.get("blocks", []))
|
||||||
|
line_end = _last_present(block.get("line_end") for block in section.get("blocks", []))
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
insert into sections(
|
||||||
|
snapshot_id, path, idx, heading_text, heading_level, line,
|
||||||
|
text, line_start, line_end
|
||||||
|
) values (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
snapshot_id,
|
||||||
|
path,
|
||||||
|
idx,
|
||||||
|
str(heading["text"]),
|
||||||
|
int(heading["level"]),
|
||||||
|
int(heading["line"]),
|
||||||
|
text,
|
||||||
|
line_start,
|
||||||
|
line_end,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
for idx, block in enumerate(document.get("blocks", [])):
|
||||||
|
conn.execute(
|
||||||
|
"""
|
||||||
|
insert into blocks(
|
||||||
|
snapshot_id, path, idx, type, text, line_start, line_end, heading_level
|
||||||
|
) values (?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
snapshot_id,
|
||||||
|
path,
|
||||||
|
idx,
|
||||||
|
str(block["type"]),
|
||||||
|
str(block.get("text", "")),
|
||||||
|
block.get("line_start"),
|
||||||
|
block.get("line_end"),
|
||||||
|
block.get("heading_level"),
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _load_dependencies(conn: sqlite3.Connection) -> dict[str, list[DependencyEdge]]:
|
||||||
|
rows = conn.execute(
|
||||||
|
"""
|
||||||
|
select path, source_id, target, kind, target_snapshot_id, metadata_json
|
||||||
|
from dependencies
|
||||||
|
order by path, source_id, target
|
||||||
|
"""
|
||||||
|
).fetchall()
|
||||||
|
dependencies: dict[str, list[DependencyEdge]] = {}
|
||||||
|
for row in rows:
|
||||||
|
dependencies.setdefault(row["path"], []).append(
|
||||||
|
DependencyEdge(
|
||||||
|
source_id=row["source_id"],
|
||||||
|
target=row["target"],
|
||||||
|
kind=row["kind"],
|
||||||
|
target_snapshot_id=row["target_snapshot_id"],
|
||||||
|
metadata=json.loads(row["metadata_json"] or "{}"),
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return dependencies
|
||||||
|
|
||||||
|
|
||||||
|
def _relative(path: Path, root: Path) -> str:
|
||||||
|
resolved = path.resolve()
|
||||||
|
try:
|
||||||
|
return resolved.relative_to(root).as_posix()
|
||||||
|
except ValueError:
|
||||||
|
return resolved.as_posix()
|
||||||
|
|
||||||
|
|
||||||
|
def _json(data: Any) -> str:
|
||||||
|
return json.dumps(data, sort_keys=True, ensure_ascii=False)
|
||||||
|
|
||||||
|
|
||||||
|
def _first_present(values: Any) -> int | None:
|
||||||
|
for value in values:
|
||||||
|
if value is not None:
|
||||||
|
return int(value)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _last_present(values: Any) -> int | None:
|
||||||
|
found: int | None = None
|
||||||
|
for value in values:
|
||||||
|
if value is not None:
|
||||||
|
found = int(value)
|
||||||
|
return found
|
||||||
@@ -18,8 +18,10 @@ from markitect_tool.cache import (
|
|||||||
)
|
)
|
||||||
from markitect_tool.backend import (
|
from markitect_tool.backend import (
|
||||||
BackendRegistryError,
|
BackendRegistryError,
|
||||||
|
LocalSnapshotStore,
|
||||||
load_backend_registry,
|
load_backend_registry,
|
||||||
load_snapshot_state_file,
|
load_snapshot_state_file,
|
||||||
|
local_index_path_for,
|
||||||
plan_snapshot_refresh,
|
plan_snapshot_refresh,
|
||||||
snapshot_identity_for_file,
|
snapshot_identity_for_file,
|
||||||
)
|
)
|
||||||
@@ -95,6 +97,51 @@ def parse(file: Path, output_format: str) -> None:
|
|||||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
|
||||||
|
|
||||||
|
@main.group()
|
||||||
|
def ast() -> None:
|
||||||
|
"""Inspect parsed Markdown ASTs and parser summaries."""
|
||||||
|
|
||||||
|
|
||||||
|
@ast.command("show")
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "tree"], case_sensitive=False),
|
||||||
|
default="json",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def ast_show(file: Path, output_format: str) -> None:
|
||||||
|
"""Show a parsed Markdown AST without requiring a cache."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
data = document.to_dict()
|
||||||
|
if output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
elif output_format == "tree":
|
||||||
|
for heading in document.headings:
|
||||||
|
click.echo(f"{'#' * heading.level} {heading.text}")
|
||||||
|
else:
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
|
||||||
|
|
||||||
|
@ast.command("stats")
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def ast_stats(file: Path, output_format: str) -> None:
|
||||||
|
"""Summarize parsed Markdown AST shape and token distribution."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
data = _ast_stats(document.to_dict(), str(file))
|
||||||
|
_emit_ast_stats(data, output_format)
|
||||||
|
|
||||||
|
|
||||||
@main.command()
|
@main.command()
|
||||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
@click.option(
|
@click.option(
|
||||||
@@ -726,6 +773,40 @@ def cache() -> None:
|
|||||||
"""Fingerprint Markdown files and detect changed inputs."""
|
"""Fingerprint Markdown files and detect changed inputs."""
|
||||||
|
|
||||||
|
|
||||||
|
@cache.command("init")
|
||||||
|
@click.option(
|
||||||
|
"--root",
|
||||||
|
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||||
|
default=Path("."),
|
||||||
|
show_default=True,
|
||||||
|
help="Root used for the default local index path.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--index-path",
|
||||||
|
type=click.Path(dir_okay=False, path_type=Path),
|
||||||
|
help="SQLite index path. Defaults to .markitect/cache/index.sqlite3 under root.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def cache_init(root: Path, index_path: Path | None, output_format: str) -> None:
|
||||||
|
"""Initialize the local SQLite snapshot/index store."""
|
||||||
|
|
||||||
|
resolved_index = local_index_path_for(root, index_path)
|
||||||
|
store = LocalSnapshotStore(resolved_index)
|
||||||
|
store.initialize()
|
||||||
|
data = {
|
||||||
|
"index_path": str(resolved_index),
|
||||||
|
"schema_version": "1",
|
||||||
|
"sources": len(store.load_state()),
|
||||||
|
}
|
||||||
|
_emit_local_index_data(data, output_format)
|
||||||
|
|
||||||
|
|
||||||
@cache.command("fingerprint")
|
@cache.command("fingerprint")
|
||||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
@click.option(
|
@click.option(
|
||||||
@@ -833,6 +914,68 @@ def cache_status(
|
|||||||
raise click.exceptions.Exit(1 if status.dirty else 0)
|
raise click.exceptions.Exit(1 if status.dirty else 0)
|
||||||
|
|
||||||
|
|
||||||
|
@cache.command("index")
|
||||||
|
@click.argument("paths", nargs=-1, required=True, type=click.Path(exists=True, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--root",
|
||||||
|
type=click.Path(exists=True, file_okay=False, path_type=Path),
|
||||||
|
default=Path("."),
|
||||||
|
show_default=True,
|
||||||
|
help="Root used for relative index paths.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--index-path",
|
||||||
|
type=click.Path(dir_okay=False, path_type=Path),
|
||||||
|
help="SQLite index path. Defaults to .markitect/cache/index.sqlite3 under root.",
|
||||||
|
)
|
||||||
|
@click.option("--no-recursive", is_flag=True, help="Do not recurse into directories.")
|
||||||
|
@click.option(
|
||||||
|
"--no-verify-hashes",
|
||||||
|
is_flag=True,
|
||||||
|
help="Do not hash metadata-changed files before parsing.",
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--parse-option",
|
||||||
|
"parse_options",
|
||||||
|
multiple=True,
|
||||||
|
metavar="KEY=VALUE",
|
||||||
|
help="Parse option included in the snapshot identity hash.",
|
||||||
|
)
|
||||||
|
@click.option("--contract-hash", help="Optional contract hash included in snapshot identity.")
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def cache_index(
|
||||||
|
paths: tuple[Path, ...],
|
||||||
|
root: Path,
|
||||||
|
index_path: Path | None,
|
||||||
|
no_recursive: bool,
|
||||||
|
no_verify_hashes: bool,
|
||||||
|
parse_options: tuple[str, ...],
|
||||||
|
contract_hash: str | None,
|
||||||
|
output_format: str,
|
||||||
|
) -> None:
|
||||||
|
"""Build or refresh the local SQLite snapshot/index store."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
store = LocalSnapshotStore(local_index_path_for(root, index_path))
|
||||||
|
result = store.build(
|
||||||
|
list(paths),
|
||||||
|
root=root,
|
||||||
|
recursive=not no_recursive,
|
||||||
|
parse_options=_parse_key_value_options(parse_options),
|
||||||
|
contract_hash=contract_hash,
|
||||||
|
verify_hashes=not no_verify_hashes,
|
||||||
|
)
|
||||||
|
except ValueError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
_emit_local_index_data(result.to_dict(), output_format)
|
||||||
|
|
||||||
|
|
||||||
@main.group()
|
@main.group()
|
||||||
def template() -> None:
|
def template() -> None:
|
||||||
"""Render and inspect deterministic Markdown templates."""
|
"""Render and inspect deterministic Markdown templates."""
|
||||||
@@ -1213,6 +1356,42 @@ def _emit_cache_data(data: dict, output_format: str) -> None:
|
|||||||
click.echo(f"written: {data['written']}")
|
click.echo(f"written: {data['written']}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_ast_stats(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo(f"document_path: {data['document_path']}")
|
||||||
|
for key, value in data["counts"].items():
|
||||||
|
click.echo(f"{key}: {value}")
|
||||||
|
click.echo(f"max_heading_depth: {data['max_heading_depth']}")
|
||||||
|
if data["token_types"]:
|
||||||
|
click.echo("token_types:")
|
||||||
|
for token_type, count in data["token_types"].items():
|
||||||
|
click.echo(f"- {token_type}: {count}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_local_index_data(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo(f"index_path: {data['index_path']}")
|
||||||
|
if data.get("schema_version"):
|
||||||
|
click.echo(f"schema_version: {data['schema_version']}")
|
||||||
|
if data.get("sources") is not None:
|
||||||
|
click.echo(f"sources: {data['sources']}")
|
||||||
|
if data.get("dirty") is not None:
|
||||||
|
click.echo("dirty" if data["dirty"] else "clean")
|
||||||
|
for key in ["parsed", "indexed", "metadata_updated", "deleted"]:
|
||||||
|
values = data.get(key, [])
|
||||||
|
click.echo(f"{key}: {len(values)}")
|
||||||
|
for value in values:
|
||||||
|
click.echo(f"- {value}")
|
||||||
|
|
||||||
|
|
||||||
def _emit_reference_result(data: dict, output_format: str) -> None:
|
def _emit_reference_result(data: dict, output_format: str) -> None:
|
||||||
if output_format == "json":
|
if output_format == "json":
|
||||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
@@ -1404,6 +1583,29 @@ def _set_path(mapping: dict[str, object], path: list[str], value: object) -> Non
|
|||||||
current[path[-1]] = value
|
current[path[-1]] = value
|
||||||
|
|
||||||
|
|
||||||
|
def _ast_stats(document: dict, document_path: str) -> dict:
|
||||||
|
token_types: dict[str, int] = {}
|
||||||
|
for token in document.get("tokens", []):
|
||||||
|
token_type = str(token.get("type", "unknown"))
|
||||||
|
token_types[token_type] = token_types.get(token_type, 0) + 1
|
||||||
|
headings = document.get("headings", [])
|
||||||
|
return {
|
||||||
|
"document_path": document_path,
|
||||||
|
"source_path": document.get("source_path"),
|
||||||
|
"counts": {
|
||||||
|
"frontmatter_keys": len(document.get("frontmatter", {})),
|
||||||
|
"headings": len(headings),
|
||||||
|
"sections": len(document.get("sections", [])),
|
||||||
|
"blocks": len(document.get("blocks", [])),
|
||||||
|
"tokens": len(document.get("tokens", [])),
|
||||||
|
},
|
||||||
|
"max_heading_depth": max(
|
||||||
|
[int(heading.get("level", 0)) for heading in headings] or [0]
|
||||||
|
),
|
||||||
|
"token_types": dict(sorted(token_types.items())),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def _load_template_data(data_file: Path | None) -> dict[str, object]:
|
def _load_template_data(data_file: Path | None) -> dict[str, object]:
|
||||||
if data_file is None:
|
if data_file is None:
|
||||||
return {}
|
return {}
|
||||||
|
|||||||
89
tests/test_local_snapshot_store.py
Normal file
89
tests/test_local_snapshot_store.py
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.backend import LocalSnapshotStore, local_index_path_for
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_snapshot_store_persists_state_and_document(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("---\ntitle: Example\n---\n# Doc\n\nBody.\n", encoding="utf-8")
|
||||||
|
store = LocalSnapshotStore(tmp_path / ".markitect" / "cache" / "index.sqlite3")
|
||||||
|
|
||||||
|
state = store.put_file(source, root=tmp_path)
|
||||||
|
loaded = store.load_state()
|
||||||
|
document = store.get_document("doc.md")
|
||||||
|
|
||||||
|
assert state.path == "doc.md"
|
||||||
|
assert state.snapshot_id.startswith("snapshot:")
|
||||||
|
assert loaded[0].path == "doc.md"
|
||||||
|
assert loaded[0].content_hash == state.content_hash
|
||||||
|
assert document["frontmatter"]["title"] == "Example"
|
||||||
|
assert document["headings"][0]["text"] == "Doc"
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_snapshot_store_build_is_incremental(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Doc\n", encoding="utf-8")
|
||||||
|
store = LocalSnapshotStore(local_index_path_for(tmp_path))
|
||||||
|
|
||||||
|
first = store.build([tmp_path], root=tmp_path)
|
||||||
|
second = store.build([tmp_path], root=tmp_path)
|
||||||
|
|
||||||
|
assert first.parsed == ["doc.md"]
|
||||||
|
assert first.indexed == ["doc.md"]
|
||||||
|
assert second.parsed == []
|
||||||
|
assert second.indexed == []
|
||||||
|
assert not second.dirty
|
||||||
|
|
||||||
|
source.write_text("# Doc\n\nChanged.\n", encoding="utf-8")
|
||||||
|
changed = store.build([tmp_path], root=tmp_path)
|
||||||
|
|
||||||
|
assert changed.parsed == ["doc.md"]
|
||||||
|
assert changed.indexed == ["doc.md"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_local_snapshot_store_deletes_removed_files(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Doc\n", encoding="utf-8")
|
||||||
|
store = LocalSnapshotStore(local_index_path_for(tmp_path))
|
||||||
|
store.build([tmp_path], root=tmp_path)
|
||||||
|
|
||||||
|
source.unlink()
|
||||||
|
result = store.build([tmp_path], root=tmp_path)
|
||||||
|
|
||||||
|
assert result.deleted == ["doc.md"]
|
||||||
|
assert store.load_state() == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_ast_show_and_stats(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Doc\n\nBody.\n", encoding="utf-8")
|
||||||
|
runner = CliRunner()
|
||||||
|
|
||||||
|
shown = runner.invoke(main, ["ast", "show", str(source), "--format", "tree"])
|
||||||
|
stats = runner.invoke(main, ["ast", "stats", str(source)])
|
||||||
|
|
||||||
|
assert shown.exit_code == 0
|
||||||
|
assert "# Doc" in shown.output
|
||||||
|
assert stats.exit_code == 0
|
||||||
|
assert "headings: 1" in stats.output
|
||||||
|
assert "paragraph_open" in stats.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_cache_init_and_index(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text("# Doc\n", encoding="utf-8")
|
||||||
|
runner = CliRunner()
|
||||||
|
|
||||||
|
initialized = runner.invoke(main, ["cache", "init", "--root", str(tmp_path)])
|
||||||
|
indexed = runner.invoke(main, ["cache", "index", str(tmp_path), "--root", str(tmp_path)])
|
||||||
|
clean = runner.invoke(main, ["cache", "index", str(tmp_path), "--root", str(tmp_path)])
|
||||||
|
|
||||||
|
assert initialized.exit_code == 0
|
||||||
|
assert "schema_version: 1" in initialized.output
|
||||||
|
assert indexed.exit_code == 0
|
||||||
|
assert "parsed: 1" in indexed.output
|
||||||
|
assert clean.exit_code == 0
|
||||||
|
assert "clean" in clean.output
|
||||||
@@ -51,7 +51,7 @@ directly and should report actual refresh work against the same categories.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0007-T001
|
id: MKTT-WP-0007-T001
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
|
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
|
||||||
```
|
```
|
||||||
@@ -59,6 +59,10 @@ state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
|
|||||||
Persist parsed document snapshots and source metadata in a local cache
|
Persist parsed document snapshots and source metadata in a local cache
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
|
Implemented: `LocalSnapshotStore`, SQLite schema initialization, source-state
|
||||||
|
loading, parsed document JSON persistence, provenance envelope storage, and
|
||||||
|
relative path handling. See `docs/local-index-backend.md`.
|
||||||
|
|
||||||
Implementation hints:
|
Implementation hints:
|
||||||
|
|
||||||
- Persist `SnapshotState` fields in the snapshot/source tables.
|
- Persist `SnapshotState` fields in the snapshot/source tables.
|
||||||
@@ -71,7 +75,7 @@ Implementation hints:
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0007-T002
|
id: MKTT-WP-0007-T002
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
|
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
|
||||||
```
|
```
|
||||||
@@ -86,6 +90,9 @@ mkt ast stats <file>
|
|||||||
Use the current parsed document and token model. Do not require cache presence
|
Use the current parsed document and token model. Do not require cache presence
|
||||||
for single-file use.
|
for single-file use.
|
||||||
|
|
||||||
|
Implemented: `mkt ast show <file>` and `mkt ast stats <file>` with JSON, YAML,
|
||||||
|
tree/text output modes.
|
||||||
|
|
||||||
## P7.3 - Add optional JSONPath query adapter
|
## P7.3 - Add optional JSONPath query adapter
|
||||||
|
|
||||||
```task
|
```task
|
||||||
@@ -102,7 +109,7 @@ shared query result envelope.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0007-T004
|
id: MKTT-WP-0007-T004
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
|
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
|
||||||
```
|
```
|
||||||
@@ -121,6 +128,11 @@ Implementation hints:
|
|||||||
- Preserve source spans and content-unit ids from WP-0010 reference/literate
|
- Preserve source spans and content-unit ids from WP-0010 reference/literate
|
||||||
layers.
|
layers.
|
||||||
|
|
||||||
|
Implemented: source, heading, section, block, dependency, and metadata tables;
|
||||||
|
document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on
|
||||||
|
path, content hash, snapshot id, parser identity, unit path, and dependency
|
||||||
|
target.
|
||||||
|
|
||||||
## P7.5 - Add FTS5 section/block search
|
## P7.5 - Add FTS5 section/block search
|
||||||
|
|
||||||
```task
|
```task
|
||||||
@@ -137,7 +149,7 @@ relevance ranking.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0007-T006
|
id: MKTT-WP-0007-T006
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
|
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
|
||||||
```
|
```
|
||||||
@@ -156,6 +168,11 @@ Implementation hints:
|
|||||||
- Report planned vs actual counts for hash, parse, index, metadata update,
|
- Report planned vs actual counts for hash, parse, index, metadata update,
|
||||||
delete, and invalidation work.
|
delete, and invalidation work.
|
||||||
|
|
||||||
|
Implemented first pass: `LocalSnapshotStore.build()` drives refresh from
|
||||||
|
`SnapshotRefreshPlan`, hashes metadata-changed files by default, skips
|
||||||
|
unchanged content, updates metadata-only rows, refreshes changed snapshots, and
|
||||||
|
deletes removed files.
|
||||||
|
|
||||||
## P7.7 - Add local index CLI
|
## P7.7 - Add local index CLI
|
||||||
|
|
||||||
```task
|
```task
|
||||||
@@ -174,6 +191,10 @@ mkt cache query <selector-or-query>
|
|||||||
mkt search <text>
|
mkt search <text>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Partial implementation: `mkt cache init` initializes the local SQLite store and
|
||||||
|
`mkt cache index <path>` builds or refreshes it. Cache-backed query and FTS
|
||||||
|
search remain part of this task.
|
||||||
|
|
||||||
## Exit Criteria
|
## Exit Criteria
|
||||||
|
|
||||||
- Legacy AST/JSONPath value is recovered as an optional backend.
|
- Legacy AST/JSONPath value is recovered as an optional backend.
|
||||||
|
|||||||
Reference in New Issue
Block a user