generated from coulomb/repo-seed
Workplan dependencies and prio for text research lab workplans
This commit is contained in:
259
docs/cache-backend-architecture-blueprint.md
Normal file
259
docs/cache-backend-architecture-blueprint.md
Normal file
@@ -0,0 +1,259 @@
|
|||||||
|
# Cache Backend Architecture Blueprint
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This blueprint defines an optional backend architecture for sophisticated
|
||||||
|
knowledge systems built on top of `markitect-tool`.
|
||||||
|
|
||||||
|
It is a research-lab architecture: powerful enough to support cached ASTs,
|
||||||
|
advanced query backends, agent memory, and access control, but separated from
|
||||||
|
the slim core so one-off CLI use stays fast and simple.
|
||||||
|
|
||||||
|
## Architectural Boundary
|
||||||
|
|
||||||
|
The core package owns:
|
||||||
|
|
||||||
|
- Markdown parsing
|
||||||
|
- document contracts
|
||||||
|
- simple selectors
|
||||||
|
- deterministic transforms and generation primitives
|
||||||
|
- unified diagnostics
|
||||||
|
|
||||||
|
The optional backend fabric owns:
|
||||||
|
|
||||||
|
- persistent snapshots
|
||||||
|
- indexes
|
||||||
|
- advanced query adapters
|
||||||
|
- memory/context packages
|
||||||
|
- policy enforcement
|
||||||
|
- provenance records
|
||||||
|
- trace and performance metadata
|
||||||
|
|
||||||
|
The core must be able to run without the backend fabric.
|
||||||
|
|
||||||
|
## Conceptual Layers
|
||||||
|
|
||||||
|
```text
|
||||||
|
Markdown files
|
||||||
|
-> Core parser and contract layer
|
||||||
|
-> Content-addressed document snapshots
|
||||||
|
-> Index fabric
|
||||||
|
-> AST/JSON index
|
||||||
|
-> full-text index
|
||||||
|
-> vector/semantic index
|
||||||
|
-> analytical/index export
|
||||||
|
-> Query adapter registry
|
||||||
|
-> simple selectors
|
||||||
|
-> JSONPath
|
||||||
|
-> SQL/FTS
|
||||||
|
-> vector/hybrid retrieval
|
||||||
|
-> Context package registry
|
||||||
|
-> activated working sets
|
||||||
|
-> memory namespaces
|
||||||
|
-> agent-ready context bundles
|
||||||
|
-> Access policy gateway
|
||||||
|
-> labels/ACL/ReBAC/ABAC
|
||||||
|
-> result filtering and denial diagnostics
|
||||||
|
-> Provenance and observability
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Interfaces
|
||||||
|
|
||||||
|
### Snapshot Backend
|
||||||
|
|
||||||
|
Responsible for durable parsed-document snapshots.
|
||||||
|
|
||||||
|
Minimum protocol:
|
||||||
|
|
||||||
|
```text
|
||||||
|
put_document(source_path, content, parse_options) -> snapshot_id
|
||||||
|
get_snapshot(snapshot_id) -> DocumentSnapshot
|
||||||
|
resolve_source(source_path) -> latest snapshot_id
|
||||||
|
diff_snapshot(old_id, new_id) -> SnapshotDiff
|
||||||
|
```
|
||||||
|
|
||||||
|
Snapshot identity should include:
|
||||||
|
|
||||||
|
- source content hash
|
||||||
|
- parser version
|
||||||
|
- parse options
|
||||||
|
- contract version when relevant
|
||||||
|
|
||||||
|
### Index Backend
|
||||||
|
|
||||||
|
Responsible for derived lookup structures.
|
||||||
|
|
||||||
|
Minimum protocol:
|
||||||
|
|
||||||
|
```text
|
||||||
|
capabilities() -> IndexCapabilities
|
||||||
|
build(snapshot_ids, options) -> IndexBuildResult
|
||||||
|
refresh(changed_snapshots) -> IndexBuildResult
|
||||||
|
query(request) -> QueryResult
|
||||||
|
explain(request) -> QueryPlan
|
||||||
|
```
|
||||||
|
|
||||||
|
Capabilities should include:
|
||||||
|
|
||||||
|
- `jsonpath`
|
||||||
|
- `sql`
|
||||||
|
- `fts`
|
||||||
|
- `vector`
|
||||||
|
- `hybrid`
|
||||||
|
- `inline_tokens`
|
||||||
|
- `section_graph`
|
||||||
|
- `policy_pushdown`
|
||||||
|
|
||||||
|
### Query Adapter
|
||||||
|
|
||||||
|
Translates a stable Markitect query request into backend-specific execution.
|
||||||
|
|
||||||
|
Minimum protocol:
|
||||||
|
|
||||||
|
```text
|
||||||
|
name
|
||||||
|
supports(selector_or_query, target) -> bool
|
||||||
|
execute(document_or_backend, request) -> QueryResult
|
||||||
|
explain(request) -> QueryExplanation
|
||||||
|
```
|
||||||
|
|
||||||
|
Adapters must return a common result envelope:
|
||||||
|
|
||||||
|
- kind
|
||||||
|
- path
|
||||||
|
- value
|
||||||
|
- text
|
||||||
|
- source location
|
||||||
|
- snapshot id
|
||||||
|
- provenance
|
||||||
|
- policy decision
|
||||||
|
- backend metadata
|
||||||
|
|
||||||
|
### Context Package Registry
|
||||||
|
|
||||||
|
Responsible for agent-ready working memory.
|
||||||
|
|
||||||
|
Minimum protocol:
|
||||||
|
|
||||||
|
```text
|
||||||
|
create_package(query_or_manifest, budget, policy) -> context_package_id
|
||||||
|
activate(package_id, thread_or_workspace) -> activation_id
|
||||||
|
deactivate(activation_id)
|
||||||
|
refresh(package_id) -> package_id
|
||||||
|
explain(package_id) -> ContextPackageReport
|
||||||
|
```
|
||||||
|
|
||||||
|
Context packages should include:
|
||||||
|
|
||||||
|
- included source spans
|
||||||
|
- summary layers
|
||||||
|
- token estimates
|
||||||
|
- provenance
|
||||||
|
- freshness
|
||||||
|
- policy labels
|
||||||
|
- retrieval recipe
|
||||||
|
- cache keys
|
||||||
|
|
||||||
|
### Access Policy Gateway
|
||||||
|
|
||||||
|
Responsible for authorization and redaction before results leave a backend.
|
||||||
|
|
||||||
|
Minimum protocol:
|
||||||
|
|
||||||
|
```text
|
||||||
|
authorize(subject, action, object, context) -> PolicyDecision
|
||||||
|
filter_results(subject, action, results, context) -> FilteredResults
|
||||||
|
explain_decision(decision_id) -> PolicyExplanation
|
||||||
|
```
|
||||||
|
|
||||||
|
Policy should support a ladder:
|
||||||
|
|
||||||
|
1. Labels and trust zones.
|
||||||
|
2. File/path ACLs.
|
||||||
|
3. Relationship-based access control.
|
||||||
|
4. Attribute/rule-based policies.
|
||||||
|
5. External authorization services.
|
||||||
|
|
||||||
|
## Suggested Backend Manifest
|
||||||
|
|
||||||
|
Backends should register through a Markdown/YAML manifest:
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
# Local SQLite Backend
|
||||||
|
|
||||||
|
```yaml markitect-backend
|
||||||
|
id: local-sqlite-cache
|
||||||
|
kind: cache-backend
|
||||||
|
capabilities:
|
||||||
|
- snapshots
|
||||||
|
- json
|
||||||
|
- fts
|
||||||
|
- sql
|
||||||
|
- provenance
|
||||||
|
storage:
|
||||||
|
engine: sqlite
|
||||||
|
path: .markitect/cache/index.sqlite
|
||||||
|
policy:
|
||||||
|
mode: labels
|
||||||
|
```
|
||||||
|
````
|
||||||
|
|
||||||
|
## CLI Direction
|
||||||
|
|
||||||
|
The first backend CLI should be explicit:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt cache init
|
||||||
|
mkt cache build <path>
|
||||||
|
mkt cache status
|
||||||
|
mkt cache query <selector-or-query> --backend <name>
|
||||||
|
mkt ast show <file>
|
||||||
|
mkt ast query <file> <jsonpath>
|
||||||
|
mkt context pack <manifest-or-query>
|
||||||
|
mkt context activate <package-id>
|
||||||
|
mkt policy check <subject> <action> <object>
|
||||||
|
```
|
||||||
|
|
||||||
|
Do not hide persistence behind `mkt query`. The user should know when the tool
|
||||||
|
is querying live files versus a persistent backend.
|
||||||
|
|
||||||
|
## Recommended First Stack
|
||||||
|
|
||||||
|
Start with:
|
||||||
|
|
||||||
|
- content hashes in Python standard library
|
||||||
|
- SQLite for snapshot metadata, JSON, and FTS5
|
||||||
|
- JSONPath as an optional extra
|
||||||
|
- local filesystem cache directory
|
||||||
|
- simple label policy
|
||||||
|
- provenance tables
|
||||||
|
|
||||||
|
Defer:
|
||||||
|
|
||||||
|
- vector search until text/structure cache works
|
||||||
|
- external authorization engines until local policy model is stable
|
||||||
|
- MCP server exposure until resources/tools are secure and explainable
|
||||||
|
- distributed cache until local invalidation is boring
|
||||||
|
|
||||||
|
## Security Notes
|
||||||
|
|
||||||
|
Cached data becomes a new data exposure surface.
|
||||||
|
|
||||||
|
Minimum requirements before secure use:
|
||||||
|
|
||||||
|
- cache location is explicit
|
||||||
|
- cache entries know source path and content hash
|
||||||
|
- policy mode is visible
|
||||||
|
- query results report policy filtering
|
||||||
|
- context packages list what they include
|
||||||
|
- destructive cache operations require explicit command
|
||||||
|
- no backend silently sends document content to a network service
|
||||||
|
|
||||||
|
## Architecture Decision
|
||||||
|
|
||||||
|
Implement the backend fabric after deterministic transform/composition
|
||||||
|
primitives are underway, but before serious caching, agent memory, or advanced
|
||||||
|
query backends. This lets WP-0003 continue while reserving a clean path for the
|
||||||
|
research-lab track.
|
||||||
76
docs/query-extraction.md
Normal file
76
docs/query-extraction.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
# Query And Extraction
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The first query layer keeps selection close to the structured Markdown model.
|
||||||
|
It is intentionally small and deterministic. JSONPath or another query backend
|
||||||
|
can be added later behind the same API if the simple selector language becomes
|
||||||
|
too limited.
|
||||||
|
|
||||||
|
## CLI
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt query <document.md> <selector> [--format json|yaml|text]
|
||||||
|
mkt extract <document.md> <selector> [--format text|json|yaml]
|
||||||
|
```
|
||||||
|
|
||||||
|
`query` returns structured matches. `extract` returns textual content from the
|
||||||
|
matches.
|
||||||
|
|
||||||
|
## Selectors
|
||||||
|
|
||||||
|
Supported targets:
|
||||||
|
|
||||||
|
- `document`, `$`, or `.`: full parsed document
|
||||||
|
- `frontmatter`: YAML frontmatter
|
||||||
|
- `headings`: heading objects
|
||||||
|
- `sections`: heading-led sections
|
||||||
|
- `blocks`: parsed content blocks
|
||||||
|
- `metrics`: document and section metrics
|
||||||
|
|
||||||
|
Supported path examples:
|
||||||
|
|
||||||
|
```text
|
||||||
|
frontmatter.status
|
||||||
|
frontmatter.owner.name
|
||||||
|
metrics.document.words
|
||||||
|
metrics.document.sections
|
||||||
|
```
|
||||||
|
|
||||||
|
Supported filters:
|
||||||
|
|
||||||
|
```text
|
||||||
|
headings[level=2]
|
||||||
|
headings[text=Decision]
|
||||||
|
headings[text~=decision]
|
||||||
|
sections[heading=Context]
|
||||||
|
sections[heading~=risk]
|
||||||
|
sections[contains=problem]
|
||||||
|
sections[contains~=PROBLEM]
|
||||||
|
blocks[type=paragraph]
|
||||||
|
blocks[contains~=follow-up]
|
||||||
|
```
|
||||||
|
|
||||||
|
`=` is exact and case-sensitive. `~=` is substring matching and
|
||||||
|
case-insensitive.
|
||||||
|
|
||||||
|
## Current Boundary
|
||||||
|
|
||||||
|
This is not a full query language. It covers practical extraction from the
|
||||||
|
current parser model:
|
||||||
|
|
||||||
|
- frontmatter values
|
||||||
|
- headings
|
||||||
|
- sections
|
||||||
|
- content blocks
|
||||||
|
- metrics
|
||||||
|
|
||||||
|
Future query backend work should preserve this simple surface and add optional
|
||||||
|
adapters rather than forcing every user into a heavier language.
|
||||||
|
|
||||||
|
Advanced query and cache backends are tracked in:
|
||||||
|
|
||||||
|
- `docs/cache-backend-architecture-blueprint.md`
|
||||||
|
- `workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md`
|
||||||
248
docs/research-lab-cache-backend-research.md
Normal file
248
docs/research-lab-cache-backend-research.md
Normal file
@@ -0,0 +1,248 @@
|
|||||||
|
# Research Lab: Sophisticated Cache Backends
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This research note explores how `markitect-tool` can keep its slim,
|
||||||
|
markdown-native core while allowing sophisticated optional backends for cached
|
||||||
|
ASTs, structured indexes, multiple query paradigms, agent working memory, and
|
||||||
|
access-controlled knowledge systems.
|
||||||
|
|
||||||
|
The goal is not to rebuild `markitect-main` wholesale. The goal is to preserve
|
||||||
|
the useful insight behind it: once Markdown has been parsed into a trustworthy
|
||||||
|
structure, many higher-value operations become possible if that structure can
|
||||||
|
be cached, indexed, queried, reactivated, and governed.
|
||||||
|
|
||||||
|
## Research Signals
|
||||||
|
|
||||||
|
### Content Addressing And Reproducibility
|
||||||
|
|
||||||
|
Git's object model is a practical reference for content-addressed storage:
|
||||||
|
content is written to an object database and retrieved by a hash-derived key.
|
||||||
|
Bazel remote caching similarly separates action outputs from metadata so work
|
||||||
|
can be reused when inputs are unchanged.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- Parse results should be keyed by content hash, parser version, and options.
|
||||||
|
- Derived indexes should declare their input snapshots and invalidation rules.
|
||||||
|
- Reproducible context packages need stable object identities.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
|
||||||
|
- https://docs.bazel.build/versions/main/remote-caching.html
|
||||||
|
|
||||||
|
### Structured Query And AST Introspection
|
||||||
|
|
||||||
|
JSONPath is now standardized as RFC 9535. It defines selection and extraction
|
||||||
|
over JSON values and has security considerations around implementation behavior
|
||||||
|
and query construction. This makes it a good optional backend for power users
|
||||||
|
who need raw access to the full parsed structure.
|
||||||
|
|
||||||
|
SQLite JSON and FTS5 provide a pragmatic local storage/query foundation. FTS5
|
||||||
|
supports full-text search, relevance ranking, phrase/prefix/NEAR queries, and
|
||||||
|
external-content tables. These features map well to Markdown sections and
|
||||||
|
blocks while keeping local-first operation.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- Keep the current simple selector API as the common surface.
|
||||||
|
- Add JSONPath over `Document.to_dict()` as an optional advanced adapter.
|
||||||
|
- Add SQLite as the first local persistent index backend.
|
||||||
|
- Keep AST introspection as a debugging and research-lab capability, not as
|
||||||
|
the default user interface.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://www.rfc-editor.org/rfc/rfc9535.html
|
||||||
|
- https://www.sqlite.org/json1.html
|
||||||
|
- https://www.sqlite.org/fts5.html
|
||||||
|
|
||||||
|
### Columnar And Vector Backends
|
||||||
|
|
||||||
|
Apache Arrow defines a language-independent columnar memory format. DuckDB is
|
||||||
|
strong for local analytical SQL over structured data. Vector databases such as
|
||||||
|
Qdrant, LanceDB, and pgvector provide semantic retrieval primitives.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- The core should not depend on any vector database.
|
||||||
|
- Index backends should advertise capabilities: text search, SQL, JSONPath,
|
||||||
|
vector search, hybrid retrieval, analytical scans.
|
||||||
|
- Vector indexes should store provenance back to document, section, and content
|
||||||
|
hash, not just opaque chunks.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://arrow.apache.org/docs/format/Columnar.html
|
||||||
|
- https://duckdb.org/docs/stable/data/json/overview
|
||||||
|
- https://qdrant.tech/documentation/manage-data/collections/
|
||||||
|
- https://docs.lancedb.com/
|
||||||
|
- https://github.com/pgvector/pgvector
|
||||||
|
|
||||||
|
### Agent Context And Working Memory
|
||||||
|
|
||||||
|
The Model Context Protocol gives a useful integration model: resources provide
|
||||||
|
context/data, tools execute actions, and roots define filesystem or URI
|
||||||
|
boundaries. LangChain/LangGraph memory docs distinguish short-term,
|
||||||
|
thread-scoped memory from long-term, namespace-scoped memory, and further split
|
||||||
|
long-term memory into semantic, episodic, and procedural forms. The MemGPT
|
||||||
|
paper frames memory management as an operating-system-like problem for LLMs.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- Markitect context caches should be namespace-scoped and explicitly
|
||||||
|
activatable.
|
||||||
|
- A context package should carry text, structure, provenance, policy, freshness,
|
||||||
|
and token-budget metadata.
|
||||||
|
- Agents should be able to drop and reactivate working context by stable id.
|
||||||
|
- Memory writes need hot-path and background modes.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://modelcontextprotocol.io/specification/2025-06-18
|
||||||
|
- https://docs.langchain.com/oss/python/concepts/memory
|
||||||
|
- https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/
|
||||||
|
- https://arxiv.org/abs/2310.08560
|
||||||
|
|
||||||
|
### Provenance, Observability, And Debuggability
|
||||||
|
|
||||||
|
W3C PROV provides a vocabulary for entities, activities, agents, and
|
||||||
|
derivations. OpenTelemetry traces provide spans and attributes for observing
|
||||||
|
distributed or multi-step operations.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- Cache entries should record what produced them.
|
||||||
|
- Query results should be explainable: source file, section, content hash,
|
||||||
|
index backend, policy decision, and transform chain.
|
||||||
|
- Agent context packs should be auditable.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://www.w3.org/TR/prov-overview/
|
||||||
|
- https://opentelemetry.io/docs/concepts/signals/traces/
|
||||||
|
|
||||||
|
### Access Control: Fluid To Rigid
|
||||||
|
|
||||||
|
Zanzibar demonstrates a relationship-based authorization model at large scale.
|
||||||
|
OpenFGA and SpiceDB make Zanzibar-style relationship-based access control
|
||||||
|
available as productized systems. OPA/Rego and Cedar provide policy evaluation
|
||||||
|
models for attribute and rule-based decisions.
|
||||||
|
|
||||||
|
Relevance:
|
||||||
|
|
||||||
|
- Markitect should support a fluid-to-rigid access-control ladder.
|
||||||
|
- Local labs can start with labels and trust scopes.
|
||||||
|
- Secure deployments need policy checks before query results are returned to
|
||||||
|
agents or users.
|
||||||
|
- Policy decisions should be part of the diagnostic and provenance trail.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://www.usenix.org/conference/atc19/presentation/pang
|
||||||
|
- https://openfga.dev/docs/concepts
|
||||||
|
- https://www.openpolicyagent.org/docs/policy-language
|
||||||
|
- https://docs.cedarpolicy.com/
|
||||||
|
|
||||||
|
## Main Finding
|
||||||
|
|
||||||
|
The optional backend should be a **capability-oriented cache fabric**, not a
|
||||||
|
single database choice.
|
||||||
|
|
||||||
|
The slim core should continue to parse, validate, query, transform, and
|
||||||
|
generate Markdown without persistent infrastructure. The research-lab backend
|
||||||
|
should attach through explicit interfaces:
|
||||||
|
|
||||||
|
- content-addressed snapshots
|
||||||
|
- index manifests
|
||||||
|
- query adapter registry
|
||||||
|
- memory/context package registry
|
||||||
|
- access policy gateway
|
||||||
|
- provenance and trace records
|
||||||
|
|
||||||
|
That lets the project support spontaneous one-time tool use and also grow into
|
||||||
|
high-performance, agentic, security-sensitive knowledge systems.
|
||||||
|
|
||||||
|
## Most Promising Use Cases
|
||||||
|
|
||||||
|
### UC-RL-001: AST Introspection And JSONPath Backend
|
||||||
|
|
||||||
|
Expose raw parsed documents for advanced users:
|
||||||
|
|
||||||
|
- `mkt ast show`
|
||||||
|
- `mkt ast query --backend jsonpath`
|
||||||
|
- raw token and inline query support
|
||||||
|
- adapter path from simple selectors to JSONPath where possible
|
||||||
|
|
||||||
|
Utility:
|
||||||
|
|
||||||
|
- debugging parser behavior
|
||||||
|
- developing transforms
|
||||||
|
- power-user structural extraction
|
||||||
|
- migration path for legacy `markitect-main` AST workflows
|
||||||
|
|
||||||
|
### UC-RL-002: Local Persistent Knowledge Index
|
||||||
|
|
||||||
|
Build a local cache/index for a repo or document collection:
|
||||||
|
|
||||||
|
- content-addressed document snapshots
|
||||||
|
- SQLite JSON tables for structure
|
||||||
|
- SQLite FTS5 for section/block text search
|
||||||
|
- optional DuckDB/Arrow export for analytical work
|
||||||
|
- incremental refresh based on content hashes
|
||||||
|
|
||||||
|
Utility:
|
||||||
|
|
||||||
|
- fast repeated queries
|
||||||
|
- search across many Markdown files
|
||||||
|
- offline/local-first knowledge work
|
||||||
|
- foundation for batch transforms and generation pipelines
|
||||||
|
|
||||||
|
### UC-RL-003: Agent Working Memory Cache
|
||||||
|
|
||||||
|
Create activatable context packages for LLM agents:
|
||||||
|
|
||||||
|
- namespace-scoped memories
|
||||||
|
- short-term working sets and long-term caches
|
||||||
|
- semantic/episodic/procedural memory categories
|
||||||
|
- drop/reactivate by stable id
|
||||||
|
- token-budget-aware context assembly
|
||||||
|
- provenance and freshness metadata
|
||||||
|
|
||||||
|
Utility:
|
||||||
|
|
||||||
|
- efficient agent work across long projects
|
||||||
|
- reusable context packs for recurring tasks
|
||||||
|
- controlled memory updates and recall
|
||||||
|
- bridge from Markitect documents to agent infrastructure
|
||||||
|
|
||||||
|
### UC-RL-004: Access-Controlled Knowledge Gateway
|
||||||
|
|
||||||
|
Add policy enforcement to cached retrieval:
|
||||||
|
|
||||||
|
- labels/trust zones for local use
|
||||||
|
- ACL/ReBAC/ABAC adapters for stricter systems
|
||||||
|
- policy-aware query result filtering
|
||||||
|
- decision logs and diagnostics
|
||||||
|
- secure context packages for LLM use
|
||||||
|
|
||||||
|
Utility:
|
||||||
|
|
||||||
|
- enterprise and IT-security use cases
|
||||||
|
- multi-tenant knowledge bases
|
||||||
|
- agent access control
|
||||||
|
- auditable data exposure
|
||||||
|
|
||||||
|
## Design Principles
|
||||||
|
|
||||||
|
- The core remains infrastructure-free.
|
||||||
|
- Backends are optional and capability-declared.
|
||||||
|
- Every cached object is content-addressed or provenance-addressed.
|
||||||
|
- Query adapters return the same match/result envelope.
|
||||||
|
- Policy is checked before data leaves a backend boundary.
|
||||||
|
- Context packages are explicit, droppable, and reactivatable.
|
||||||
|
- LLM memory is data with provenance, not invisible prompt residue.
|
||||||
|
- Experimental backends belong behind stable contracts.
|
||||||
68
docs/workplan-planning-map.md
Normal file
68
docs/workplan-planning-map.md
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
# Workplan Planning Map
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document captures the current sequencing and priority view for
|
||||||
|
`markitect-tool` workplans.
|
||||||
|
|
||||||
|
State Hub currently supports workstream dependency edges, but it does not yet
|
||||||
|
have native workstream priority/order fields and does not ingest dependency
|
||||||
|
metadata from workplan frontmatter. Until that exists, this file and the
|
||||||
|
workplan frontmatter are the repo source of truth; State Hub dependency edges
|
||||||
|
and descriptions mirror the operational view.
|
||||||
|
|
||||||
|
## Priority Scale
|
||||||
|
|
||||||
|
| Priority | Meaning |
|
||||||
|
| --- | --- |
|
||||||
|
| `P0` | Current mainline work. |
|
||||||
|
| `P1` | Next enabling architecture or implementation work. |
|
||||||
|
| `P2` | High-value follow-on work, start when trigger conditions are met. |
|
||||||
|
| `P3` | Research-lab or security-sensitive extension work. |
|
||||||
|
| `complete` | Finished foundation or completed decision work. |
|
||||||
|
|
||||||
|
## Current Ordering
|
||||||
|
|
||||||
|
| Workplan | Priority | Status | Depends On | Current View |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| `MKTT-WP-0001` | complete | done | none | Repository foundation is complete. |
|
||||||
|
| `MKTT-WP-0002` | complete | done | `MKTT-WP-0001` | Legacy scope extraction is complete. |
|
||||||
|
| `MKTT-WP-0004` | complete | done | `MKTT-WP-0001`, `MKTT-WP-0002` | Contract framework is complete and informs later validation/generation work. |
|
||||||
|
| `MKTT-WP-0003` | P0 | active | `MKTT-WP-0001`, `MKTT-WP-0002`, `MKTT-WP-0004` | Mainline implementation. Continue with P3.5 transform/compose/include. |
|
||||||
|
| `MKTT-WP-0006` | P1 | todo | `MKTT-WP-0004`; task-level trigger: `MKTT-WP-0003-T005` | Start after transform/composition shape is clear and before serious cache work. |
|
||||||
|
| `MKTT-WP-0007` | P2 | todo | `MKTT-WP-0006` | First practical cache backend use case: AST/JSONPath/SQLite/FTS. |
|
||||||
|
| `MKTT-WP-0005` | P2 | todo | `MKTT-WP-0003`, `MKTT-WP-0004` | Pick up when generation/form/context or semantic assessment pressure appears. |
|
||||||
|
| `MKTT-WP-0009` | P2 | todo | `MKTT-WP-0006` | Establish access-control gateway before security-sensitive cache/context use. |
|
||||||
|
| `MKTT-WP-0008` | P3 | todo | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory cache after backend and policy floor are available. |
|
||||||
|
|
||||||
|
## Dependency Notes
|
||||||
|
|
||||||
|
The most important nuance is `MKTT-WP-0006`: it should not wait for every task
|
||||||
|
in `MKTT-WP-0003`, because it should shape cache architecture before `P3.7`.
|
||||||
|
It should wait until `MKTT-WP-0003-T005` gives transform/composition enough
|
||||||
|
shape to know what cached identities and invalidation rules must preserve.
|
||||||
|
|
||||||
|
This is a mixed task/workstream dependency. State Hub does not currently model
|
||||||
|
that natively.
|
||||||
|
|
||||||
|
## State Hub Mirror
|
||||||
|
|
||||||
|
Native State Hub dependency edges should mirror the whole-workstream
|
||||||
|
dependencies:
|
||||||
|
|
||||||
|
- `MKTT-WP-0002 -> MKTT-WP-0001`
|
||||||
|
- `MKTT-WP-0004 -> MKTT-WP-0001`
|
||||||
|
- `MKTT-WP-0004 -> MKTT-WP-0002`
|
||||||
|
- `MKTT-WP-0003 -> MKTT-WP-0001`
|
||||||
|
- `MKTT-WP-0003 -> MKTT-WP-0002`
|
||||||
|
- `MKTT-WP-0003 -> MKTT-WP-0004`
|
||||||
|
- `MKTT-WP-0006 -> MKTT-WP-0004`
|
||||||
|
- `MKTT-WP-0007 -> MKTT-WP-0006`
|
||||||
|
- `MKTT-WP-0005 -> MKTT-WP-0003`
|
||||||
|
- `MKTT-WP-0005 -> MKTT-WP-0004`
|
||||||
|
- `MKTT-WP-0009 -> MKTT-WP-0006`
|
||||||
|
- `MKTT-WP-0008 -> MKTT-WP-0006`
|
||||||
|
- `MKTT-WP-0008 -> MKTT-WP-0007`
|
||||||
|
- `MKTT-WP-0008 -> MKTT-WP-0009`
|
||||||
@@ -21,6 +21,12 @@ from markitect_tool.contract import (
|
|||||||
validate_contract_file,
|
validate_contract_file,
|
||||||
)
|
)
|
||||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||||
|
from markitect_tool.query import (
|
||||||
|
InvalidQueryError,
|
||||||
|
QueryMatch,
|
||||||
|
extract_document,
|
||||||
|
query_document,
|
||||||
|
)
|
||||||
from markitect_tool.schema import (
|
from markitect_tool.schema import (
|
||||||
MarkdownSchema,
|
MarkdownSchema,
|
||||||
SchemaValidationResult,
|
SchemaValidationResult,
|
||||||
@@ -55,4 +61,8 @@ __all__ = [
|
|||||||
"validate_contract_file",
|
"validate_contract_file",
|
||||||
"Diagnostic",
|
"Diagnostic",
|
||||||
"SourceLocation",
|
"SourceLocation",
|
||||||
|
"InvalidQueryError",
|
||||||
|
"QueryMatch",
|
||||||
|
"extract_document",
|
||||||
|
"query_document",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ from markitect_tool.contract import (
|
|||||||
load_contract_file,
|
load_contract_file,
|
||||||
validate_contract,
|
validate_contract,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.query import InvalidQueryError, extract_document, query_document
|
||||||
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
||||||
|
|
||||||
|
|
||||||
@@ -65,6 +66,60 @@ def metrics(file: Path, output_format: str) -> None:
|
|||||||
_emit_metrics(data, output_format)
|
_emit_metrics(data, output_format)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.argument("selector")
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="json",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def query(file: Path, selector: str, output_format: str) -> None:
|
||||||
|
"""Query structured Markdown content with a small selector."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
try:
|
||||||
|
matches = query_document(document, selector)
|
||||||
|
except InvalidQueryError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
data = {
|
||||||
|
"selector": selector,
|
||||||
|
"document_path": str(file),
|
||||||
|
"count": len(matches),
|
||||||
|
"matches": [match.to_dict() for match in matches],
|
||||||
|
}
|
||||||
|
_emit_query(data, output_format)
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.argument("selector")
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["text", "json", "yaml"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def extract(file: Path, selector: str, output_format: str) -> None:
|
||||||
|
"""Extract text or Markdown content from structured Markdown."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(file)
|
||||||
|
try:
|
||||||
|
items = extract_document(document, selector)
|
||||||
|
except InvalidQueryError as exc:
|
||||||
|
raise click.ClickException(str(exc)) from exc
|
||||||
|
data = {
|
||||||
|
"selector": selector,
|
||||||
|
"document_path": str(file),
|
||||||
|
"count": len(items),
|
||||||
|
"items": items,
|
||||||
|
}
|
||||||
|
_emit_extract(data, output_format)
|
||||||
|
|
||||||
|
|
||||||
@main.command()
|
@main.command()
|
||||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
@click.option(
|
@click.option(
|
||||||
@@ -214,5 +269,28 @@ def _emit_metrics(data: dict, output_format: str) -> None:
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_query(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo(f"{data['count']} match(es)")
|
||||||
|
for match in data["matches"]:
|
||||||
|
location = f":{match['line']}" if match.get("line") else ""
|
||||||
|
click.echo(f"- {match['kind']} {match['path']}{location}")
|
||||||
|
if match.get("text"):
|
||||||
|
click.echo(f" {match['text'].splitlines()[0]}")
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_extract(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
click.echo("\n\n".join(data["items"]))
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
15
src/markitect_tool/query/__init__.py
Normal file
15
src/markitect_tool/query/__init__.py
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
"""Query and extraction helpers for parsed Markdown documents."""
|
||||||
|
|
||||||
|
from markitect_tool.query.engine import (
|
||||||
|
InvalidQueryError,
|
||||||
|
QueryMatch,
|
||||||
|
extract_document,
|
||||||
|
query_document,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"InvalidQueryError",
|
||||||
|
"QueryMatch",
|
||||||
|
"extract_document",
|
||||||
|
"query_document",
|
||||||
|
]
|
||||||
242
src/markitect_tool/query/engine.py
Normal file
242
src/markitect_tool/query/engine.py
Normal file
@@ -0,0 +1,242 @@
|
|||||||
|
"""Small selector engine for structured Markdown documents."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from markitect_tool.contract import collect_metrics
|
||||||
|
from markitect_tool.core import ContentBlock, Document, Heading, Section
|
||||||
|
|
||||||
|
|
||||||
|
class InvalidQueryError(ValueError):
|
||||||
|
"""Raised when a selector cannot be parsed or evaluated."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class QueryMatch:
|
||||||
|
"""One match returned by a selector."""
|
||||||
|
|
||||||
|
kind: str
|
||||||
|
path: str
|
||||||
|
value: Any
|
||||||
|
text: str | None = None
|
||||||
|
line: int | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"kind": self.kind,
|
||||||
|
"path": self.path,
|
||||||
|
"value": self.value,
|
||||||
|
"text": self.text,
|
||||||
|
"line": self.line,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class _Selector:
|
||||||
|
target: str
|
||||||
|
path: list[str]
|
||||||
|
filters: dict[str, str]
|
||||||
|
|
||||||
|
|
||||||
|
def query_document(document: Document, selector: str) -> list[QueryMatch]:
|
||||||
|
"""Query a parsed document with a small Markitect selector."""
|
||||||
|
|
||||||
|
parsed = _parse_selector(selector)
|
||||||
|
if parsed.target in {"document", "$", "."}:
|
||||||
|
return [QueryMatch(kind="document", path="$", value=document.to_dict())]
|
||||||
|
if parsed.target == "frontmatter":
|
||||||
|
return _query_mapping(document.frontmatter, parsed.path, "frontmatter", "$.frontmatter")
|
||||||
|
if parsed.target == "headings":
|
||||||
|
return _query_headings(document.headings, parsed.filters)
|
||||||
|
if parsed.target == "sections":
|
||||||
|
return _query_sections(document.sections, parsed.filters)
|
||||||
|
if parsed.target == "blocks":
|
||||||
|
return _query_blocks(document.blocks, parsed.filters)
|
||||||
|
if parsed.target == "metrics":
|
||||||
|
return _query_mapping(collect_metrics(document).to_dict(), parsed.path, "metrics", "$.metrics")
|
||||||
|
raise InvalidQueryError(f"Unsupported selector target `{parsed.target}`")
|
||||||
|
|
||||||
|
|
||||||
|
def extract_document(document: Document, selector: str) -> list[str]:
|
||||||
|
"""Extract text content from query matches."""
|
||||||
|
|
||||||
|
extracted: list[str] = []
|
||||||
|
for match in query_document(document, selector):
|
||||||
|
if match.text is not None:
|
||||||
|
extracted.append(match.text)
|
||||||
|
elif isinstance(match.value, str):
|
||||||
|
extracted.append(match.value)
|
||||||
|
elif isinstance(match.value, int | float | bool):
|
||||||
|
extracted.append(str(match.value))
|
||||||
|
return extracted
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_selector(selector: str) -> _Selector:
|
||||||
|
raw = selector.strip()
|
||||||
|
if not raw:
|
||||||
|
raise InvalidQueryError("Selector cannot be empty")
|
||||||
|
|
||||||
|
filters: dict[str, str] = {}
|
||||||
|
base = raw
|
||||||
|
if "[" in raw or "]" in raw:
|
||||||
|
if not raw.endswith("]") or raw.count("[") != 1 or raw.count("]") != 1:
|
||||||
|
raise InvalidQueryError(f"Invalid selector filter syntax `{selector}`")
|
||||||
|
base, raw_filter = raw[:-1].split("[", 1)
|
||||||
|
filters = _parse_filters(raw_filter)
|
||||||
|
|
||||||
|
parts = [part for part in base.split(".") if part]
|
||||||
|
if not parts:
|
||||||
|
return _Selector(target="document", path=[], filters=filters)
|
||||||
|
return _Selector(target=parts[0], path=parts[1:], filters=filters)
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_filters(raw_filter: str) -> dict[str, str]:
|
||||||
|
filters: dict[str, str] = {}
|
||||||
|
for raw_part in raw_filter.split(","):
|
||||||
|
part = raw_part.strip()
|
||||||
|
if not part:
|
||||||
|
continue
|
||||||
|
operator = "~=" if "~=" in part else "="
|
||||||
|
if operator not in part:
|
||||||
|
raise InvalidQueryError(f"Invalid filter `{part}`")
|
||||||
|
key, value = part.split(operator, 1)
|
||||||
|
key = key.strip()
|
||||||
|
if operator == "~=":
|
||||||
|
key = f"{key}~"
|
||||||
|
if not key:
|
||||||
|
raise InvalidQueryError(f"Invalid filter `{part}`")
|
||||||
|
filters[key] = _strip_quotes(value.strip())
|
||||||
|
return filters
|
||||||
|
|
||||||
|
|
||||||
|
def _query_mapping(
|
||||||
|
mapping: dict[str, Any],
|
||||||
|
path: list[str],
|
||||||
|
kind: str,
|
||||||
|
root_path: str,
|
||||||
|
) -> list[QueryMatch]:
|
||||||
|
if not path:
|
||||||
|
return [QueryMatch(kind=kind, path=root_path, value=mapping)]
|
||||||
|
value: Any = mapping
|
||||||
|
current_path = root_path
|
||||||
|
for part in path:
|
||||||
|
current_path = f"{current_path}.{part}"
|
||||||
|
if isinstance(value, dict) and part in value:
|
||||||
|
value = value[part]
|
||||||
|
else:
|
||||||
|
return []
|
||||||
|
return [QueryMatch(kind=kind, path=current_path, value=value, text=_text_value(value))]
|
||||||
|
|
||||||
|
|
||||||
|
def _query_headings(headings: list[Heading], filters: dict[str, str]) -> list[QueryMatch]:
|
||||||
|
matches: list[QueryMatch] = []
|
||||||
|
for index, heading in enumerate(headings):
|
||||||
|
if not _match_heading(heading, filters):
|
||||||
|
continue
|
||||||
|
matches.append(
|
||||||
|
QueryMatch(
|
||||||
|
kind="heading",
|
||||||
|
path=f"$.headings[{index}]",
|
||||||
|
value=heading.to_dict(),
|
||||||
|
text=f"{'#' * heading.level} {heading.text}",
|
||||||
|
line=heading.line,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return matches
|
||||||
|
|
||||||
|
|
||||||
|
def _query_sections(sections: list[Section], filters: dict[str, str]) -> list[QueryMatch]:
|
||||||
|
matches: list[QueryMatch] = []
|
||||||
|
for index, section in enumerate(sections):
|
||||||
|
if not _match_section(section, filters):
|
||||||
|
continue
|
||||||
|
matches.append(
|
||||||
|
QueryMatch(
|
||||||
|
kind="section",
|
||||||
|
path=f"$.sections[{index}]",
|
||||||
|
value=section.to_dict(),
|
||||||
|
text=_section_markdown(section),
|
||||||
|
line=section.heading.line,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return matches
|
||||||
|
|
||||||
|
|
||||||
|
def _query_blocks(blocks: list[ContentBlock], filters: dict[str, str]) -> list[QueryMatch]:
|
||||||
|
matches: list[QueryMatch] = []
|
||||||
|
for index, block in enumerate(blocks):
|
||||||
|
if not _match_block(block, filters):
|
||||||
|
continue
|
||||||
|
matches.append(
|
||||||
|
QueryMatch(
|
||||||
|
kind="block",
|
||||||
|
path=f"$.blocks[{index}]",
|
||||||
|
value=block.to_dict(),
|
||||||
|
text=block.text,
|
||||||
|
line=block.line_start,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return matches
|
||||||
|
|
||||||
|
|
||||||
|
def _match_heading(heading: Heading, filters: dict[str, str]) -> bool:
|
||||||
|
for key, expected in filters.items():
|
||||||
|
if key == "level" and str(heading.level) != expected:
|
||||||
|
return False
|
||||||
|
if key in {"text", "heading", "title"} and heading.text != expected:
|
||||||
|
return False
|
||||||
|
if key in {"text~", "heading~", "title~"} and expected.lower() not in heading.text.lower():
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _match_section(section: Section, filters: dict[str, str]) -> bool:
|
||||||
|
section_text = "\n".join(block.text for block in section.blocks if block.text)
|
||||||
|
for key, expected in filters.items():
|
||||||
|
if key == "level" and str(section.heading.level) != expected:
|
||||||
|
return False
|
||||||
|
if key in {"heading", "title", "text"} and section.heading.text != expected:
|
||||||
|
return False
|
||||||
|
if key in {"heading~", "title~", "text~"} and expected.lower() not in section.heading.text.lower():
|
||||||
|
return False
|
||||||
|
if key == "contains" and expected not in section_text:
|
||||||
|
return False
|
||||||
|
if key == "contains~" and expected.lower() not in section_text.lower():
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _match_block(block: ContentBlock, filters: dict[str, str]) -> bool:
|
||||||
|
for key, expected in filters.items():
|
||||||
|
if key == "type" and block.type != expected:
|
||||||
|
return False
|
||||||
|
if key == "contains" and expected not in block.text:
|
||||||
|
return False
|
||||||
|
if key == "contains~" and expected.lower() not in block.text.lower():
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _section_markdown(section: Section) -> str:
|
||||||
|
lines = [f"{'#' * section.heading.level} {section.heading.text}"]
|
||||||
|
for block in section.blocks:
|
||||||
|
if block.text:
|
||||||
|
lines.extend(["", block.text])
|
||||||
|
return "\n".join(lines).strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _strip_quotes(value: str) -> str:
|
||||||
|
if len(value) >= 2 and value[0] == value[-1] and value[0] in {'"', "'"}:
|
||||||
|
return value[1:-1]
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
def _text_value(value: Any) -> str | None:
|
||||||
|
if isinstance(value, str):
|
||||||
|
return value
|
||||||
|
if isinstance(value, int | float | bool):
|
||||||
|
return str(value)
|
||||||
|
return None
|
||||||
148
tests/test_query_extraction.py
Normal file
148
tests/test_query_extraction.py
Normal file
@@ -0,0 +1,148 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.core import parse_markdown
|
||||||
|
from markitect_tool.query import InvalidQueryError, extract_document, query_document
|
||||||
|
|
||||||
|
|
||||||
|
QUERY_DOC = """---
|
||||||
|
document_type: adr
|
||||||
|
status: accepted
|
||||||
|
nested:
|
||||||
|
owner: Platform
|
||||||
|
---
|
||||||
|
|
||||||
|
# Use Query Selectors
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The problem is that authors need predictable extraction from Markdown.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
We will use a small selector language before adopting a larger query backend.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Queries remain readable.
|
||||||
|
- Extraction can feed later transforms.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_frontmatter_path():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "frontmatter.nested.owner")
|
||||||
|
|
||||||
|
assert len(matches) == 1
|
||||||
|
assert matches[0].kind == "frontmatter"
|
||||||
|
assert matches[0].path == "$.frontmatter.nested.owner"
|
||||||
|
assert matches[0].text == "Platform"
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_headings_by_level():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "headings[level=2]")
|
||||||
|
|
||||||
|
assert [match.value["text"] for match in matches] == [
|
||||||
|
"Context",
|
||||||
|
"Decision",
|
||||||
|
"Consequences",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_sections_by_exact_heading():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "sections[heading=Decision]")
|
||||||
|
|
||||||
|
assert len(matches) == 1
|
||||||
|
assert matches[0].kind == "section"
|
||||||
|
assert matches[0].line == 14
|
||||||
|
assert matches[0].text.startswith("## Decision")
|
||||||
|
assert "small selector language" in matches[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_sections_by_case_insensitive_contains():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "sections[contains~=TRANSFORMS]")
|
||||||
|
|
||||||
|
assert [match.value["heading"]["text"] for match in matches] == ["Consequences"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_blocks_by_type():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "blocks[type=bullet_list]")
|
||||||
|
|
||||||
|
assert len(matches) == 1
|
||||||
|
assert "Queries remain readable" in matches[0].text
|
||||||
|
|
||||||
|
|
||||||
|
def test_query_metrics_path():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
matches = query_document(document, "metrics.document.sections")
|
||||||
|
|
||||||
|
assert matches[0].value == 4
|
||||||
|
assert matches[0].text == "4"
|
||||||
|
|
||||||
|
|
||||||
|
def test_extract_document_returns_textual_matches():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
extracted = extract_document(document, "sections[heading=Context]")
|
||||||
|
|
||||||
|
assert extracted == [
|
||||||
|
"## Context\n\nThe problem is that authors need predictable extraction from Markdown."
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def test_invalid_query_reports_error():
|
||||||
|
document = parse_markdown(QUERY_DOC)
|
||||||
|
|
||||||
|
with pytest.raises(InvalidQueryError):
|
||||||
|
query_document(document, "sections[heading")
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_query_outputs_json(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text(QUERY_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main, ["query", str(source), "sections[heading=Decision]"]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert '"count": 1' in result.output
|
||||||
|
assert "Decision" in result.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_query_outputs_text(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text(QUERY_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main, ["query", str(source), "headings[level=2]", "--format", "text"]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "3 match(es)" in result.output
|
||||||
|
assert "## Context" in result.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_extract_outputs_text(tmp_path: Path):
|
||||||
|
source = tmp_path / "doc.md"
|
||||||
|
source.write_text(QUERY_DOC, encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main, ["extract", str(source), "frontmatter.status"]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert result.output.strip() == "accepted"
|
||||||
@@ -6,6 +6,9 @@ domain: markitect
|
|||||||
status: done
|
status: done
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
|
planning_priority: complete
|
||||||
|
planning_order: 10
|
||||||
|
depends_on_workplans: []
|
||||||
created: "2026-05-03"
|
created: "2026-05-03"
|
||||||
updated: "2026-05-03"
|
updated: "2026-05-03"
|
||||||
state_hub_workstream_id: "4d405d74-faec-440e-873e-692ff9ca96e7"
|
state_hub_workstream_id: "4d405d74-faec-440e-873e-692ff9ca96e7"
|
||||||
|
|||||||
@@ -6,6 +6,10 @@ domain: markitect
|
|||||||
status: done
|
status: done
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
|
planning_priority: complete
|
||||||
|
planning_order: 20
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0001
|
||||||
created: "2026-05-03"
|
created: "2026-05-03"
|
||||||
updated: "2026-05-03"
|
updated: "2026-05-03"
|
||||||
state_hub_workstream_id: "0fe54d2c-d579-4b03-a647-7a15bb835893"
|
state_hub_workstream_id: "0fe54d2c-d579-4b03-a647-7a15bb835893"
|
||||||
|
|||||||
@@ -6,6 +6,12 @@ domain: markitect
|
|||||||
status: active
|
status: active
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
|
planning_priority: P0
|
||||||
|
planning_order: 40
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0001
|
||||||
|
- MKTT-WP-0002
|
||||||
|
- MKTT-WP-0004
|
||||||
created: "2026-05-03"
|
created: "2026-05-03"
|
||||||
updated: "2026-05-03"
|
updated: "2026-05-03"
|
||||||
state_hub_workstream_id: "9fefb57d-985e-4125-8daf-03554844f45e"
|
state_hub_workstream_id: "9fefb57d-985e-4125-8daf-03554844f45e"
|
||||||
@@ -67,7 +73,7 @@ validation, structured violations, `mkt validate`, and `mkt schema validate`.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0003-T004
|
id: MKTT-WP-0003-T004
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "e4f72218-601e-488f-a5df-171b91a747d2"
|
state_hub_task_id: "e4f72218-601e-488f-a5df-171b91a747d2"
|
||||||
```
|
```
|
||||||
@@ -75,6 +81,10 @@ state_hub_task_id: "e4f72218-601e-488f-a5df-171b91a747d2"
|
|||||||
Implement FR-030 and FR-031 over the structured representation. Start with a
|
Implement FR-030 and FR-031 over the structured representation. Start with a
|
||||||
small query language or JSONPath-like adapter only if it remains simple.
|
small query language or JSONPath-like adapter only if it remains simple.
|
||||||
|
|
||||||
|
Initial implementation complete for simple selectors over frontmatter,
|
||||||
|
headings, sections, blocks, and metrics, with API access plus `mkt query` and
|
||||||
|
`mkt extract`.
|
||||||
|
|
||||||
## P3.5 - Implement transform, compose, and include primitives
|
## P3.5 - Implement transform, compose, and include primitives
|
||||||
|
|
||||||
```task
|
```task
|
||||||
|
|||||||
@@ -6,6 +6,11 @@ domain: markitect
|
|||||||
status: done
|
status: done
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
|
planning_priority: complete
|
||||||
|
planning_order: 30
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0001
|
||||||
|
- MKTT-WP-0002
|
||||||
created: "2026-05-03"
|
created: "2026-05-03"
|
||||||
updated: "2026-05-03"
|
updated: "2026-05-03"
|
||||||
state_hub_workstream_id: "558787e1-d287-46a5-9214-634e8b90a858"
|
state_hub_workstream_id: "558787e1-d287-46a5-9214-634e8b90a858"
|
||||||
|
|||||||
@@ -6,6 +6,11 @@ domain: markitect
|
|||||||
status: todo
|
status: todo
|
||||||
owner: markitect-tool
|
owner: markitect-tool
|
||||||
topic_slug: markitect
|
topic_slug: markitect
|
||||||
|
planning_priority: P2
|
||||||
|
planning_order: 70
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0003
|
||||||
|
- MKTT-WP-0004
|
||||||
created: "2026-05-03"
|
created: "2026-05-03"
|
||||||
updated: "2026-05-03"
|
updated: "2026-05-03"
|
||||||
state_hub_workstream_id: "7918687e-2364-46b1-ab7e-65aa77cb8449"
|
state_hub_workstream_id: "7918687e-2364-46b1-ab7e-65aa77cb8449"
|
||||||
|
|||||||
133
workplans/MKTT-WP-0006-cache-backend-architecture-core.md
Normal file
133
workplans/MKTT-WP-0006-cache-backend-architecture-core.md
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0006
|
||||||
|
type: workplan
|
||||||
|
title: "Optional Cache Backend Architecture Core"
|
||||||
|
domain: markitect
|
||||||
|
status: todo
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P1
|
||||||
|
planning_order: 50
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0004
|
||||||
|
depends_on_tasks:
|
||||||
|
- MKTT-WP-0003-T005
|
||||||
|
created: "2026-05-03"
|
||||||
|
updated: "2026-05-03"
|
||||||
|
state_hub_workstream_id: "0c585f8a-5c7e-4c89-b785-5b0089180256"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0006: Optional Cache Backend Architecture Core
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Create the optional backend fabric that lets `markitect-tool` attach cached
|
||||||
|
ASTs, indexes, query adapters, context packages, and policy gateways without
|
||||||
|
making persistent infrastructure mandatory for core CLI use.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Research and architecture are captured in:
|
||||||
|
|
||||||
|
- `docs/research-lab-cache-backend-research.md`
|
||||||
|
- `docs/cache-backend-architecture-blueprint.md`
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Do not start this before the current deterministic transform/composition slice
|
||||||
|
has enough shape to show what cache invalidation must preserve. Start it before
|
||||||
|
WP-0003 P3.7 caching becomes implementation work.
|
||||||
|
|
||||||
|
## P6.1 - Define backend capability model
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "8c04f146-942c-45b8-9a7b-3bd61916aa4b"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define capability names, backend manifests, and compatibility checks for:
|
||||||
|
|
||||||
|
- snapshots
|
||||||
|
- JSON/AST query
|
||||||
|
- full-text search
|
||||||
|
- SQL
|
||||||
|
- vector/hybrid search
|
||||||
|
- context packages
|
||||||
|
- policy enforcement
|
||||||
|
- provenance
|
||||||
|
|
||||||
|
## P6.2 - Define snapshot model and content identity
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "5debc135-908a-47ed-ba15-564610970e38"
|
||||||
|
```
|
||||||
|
|
||||||
|
Specify content-addressed document snapshots keyed by source content hash,
|
||||||
|
parser version, parse options, and contract version where relevant.
|
||||||
|
|
||||||
|
## P6.3 - Define backend interfaces
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "a3e37112-1197-4f6f-8de8-7b3067ef060e"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add protocol classes for snapshot backends, index backends, query adapters,
|
||||||
|
context package registries, and access policy gateways.
|
||||||
|
|
||||||
|
## P6.4 - Implement local backend registry
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T004
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "6c9b8765-4d14-436d-a2c9-c028a31aaade"
|
||||||
|
```
|
||||||
|
|
||||||
|
Load backend manifests from project config and expose registered capabilities
|
||||||
|
without importing optional dependencies unless needed.
|
||||||
|
|
||||||
|
## P6.5 - Add provenance envelope
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "7b551eae-99c8-4c8a-b781-18d59d318707"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add provenance metadata shared by snapshots, query results, context packages,
|
||||||
|
and diagnostics.
|
||||||
|
|
||||||
|
## P6.6 - Add CLI scaffolding
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0006-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "921e589c-8b0d-4eeb-8834-4a4c6c73da65"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add read-only commands:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt backend list
|
||||||
|
mkt backend inspect <id>
|
||||||
|
mkt cache status
|
||||||
|
```
|
||||||
|
|
||||||
|
No persistent write behavior is required in this task.
|
||||||
|
|
||||||
|
## Exit Criteria
|
||||||
|
|
||||||
|
- Core CLI still works without any backend.
|
||||||
|
- Backends can declare capabilities in Markdown/YAML manifests.
|
||||||
|
- Query and future cache work can target backend interfaces.
|
||||||
|
- Provenance is represented consistently.
|
||||||
125
workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md
Normal file
125
workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0007
|
||||||
|
type: workplan
|
||||||
|
title: "Advanced Query and Local Index Backend"
|
||||||
|
domain: markitect
|
||||||
|
status: todo
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P2
|
||||||
|
planning_order: 60
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0006
|
||||||
|
created: "2026-05-03"
|
||||||
|
updated: "2026-05-03"
|
||||||
|
state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0007: Advanced Query and Local Index Backend
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Implement the first practical backend use case: cached AST introspection,
|
||||||
|
JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents.
|
||||||
|
|
||||||
|
## P7.1 - Implement local snapshot store
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
|
||||||
|
```
|
||||||
|
|
||||||
|
Persist parsed document snapshots and source metadata in a local cache
|
||||||
|
directory.
|
||||||
|
|
||||||
|
## P7.2 - Add AST introspection commands
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt ast show <file>
|
||||||
|
mkt ast stats <file>
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the current parsed document and token model. Do not require cache presence
|
||||||
|
for single-file use.
|
||||||
|
|
||||||
|
## P7.3 - Add optional JSONPath query adapter
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c"
|
||||||
|
```
|
||||||
|
|
||||||
|
Support JSONPath over `Document.to_dict()` behind an optional dependency and
|
||||||
|
shared query result envelope.
|
||||||
|
|
||||||
|
## P7.4 - Build SQLite metadata and JSON index
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T004
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
|
||||||
|
```
|
||||||
|
|
||||||
|
Persist source files, content hashes, frontmatter, headings, sections, blocks,
|
||||||
|
and metrics in SQLite.
|
||||||
|
|
||||||
|
## P7.5 - Add FTS5 section/block search
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add full-text search over section and block text with source spans and
|
||||||
|
relevance ranking.
|
||||||
|
|
||||||
|
## P7.6 - Add incremental refresh
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
|
||||||
|
```
|
||||||
|
|
||||||
|
Refresh only changed files based on content hash and parser version.
|
||||||
|
|
||||||
|
## P7.7 - Add local index CLI
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0007-T007
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt cache init
|
||||||
|
mkt cache build <path>
|
||||||
|
mkt cache query <selector-or-query>
|
||||||
|
mkt search <text>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exit Criteria
|
||||||
|
|
||||||
|
- Legacy AST/JSONPath value is recovered as an optional backend.
|
||||||
|
- Local repeated queries are faster and explainable.
|
||||||
|
- Simple selectors still work without cache.
|
||||||
109
workplans/MKTT-WP-0008-agent-working-memory-context-cache.md
Normal file
109
workplans/MKTT-WP-0008-agent-working-memory-context-cache.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0008
|
||||||
|
type: workplan
|
||||||
|
title: "Agent Working Memory Context Cache"
|
||||||
|
domain: markitect
|
||||||
|
status: todo
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P3
|
||||||
|
planning_order: 90
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0006
|
||||||
|
- MKTT-WP-0007
|
||||||
|
- MKTT-WP-0009
|
||||||
|
created: "2026-05-03"
|
||||||
|
updated: "2026-05-03"
|
||||||
|
state_hub_workstream_id: "6269f338-4f5c-40ee-90e5-0371f5c3874c"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0008: Agent Working Memory Context Cache
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Create activatable context packages that let agents drop, reactivate, and
|
||||||
|
reuse project knowledge efficiently while preserving provenance and policy
|
||||||
|
metadata.
|
||||||
|
|
||||||
|
## P8.1 - Define context package schema
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "21ee9c37-4add-4886-bd03-a7bb4b20e957"
|
||||||
|
```
|
||||||
|
|
||||||
|
Represent source spans, summaries, token estimates, freshness, provenance,
|
||||||
|
policy labels, and retrieval recipes.
|
||||||
|
|
||||||
|
## P8.2 - Implement package creation from queries
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "4df06b93-13ce-41fb-a8c3-f04d4ad9d752"
|
||||||
|
```
|
||||||
|
|
||||||
|
Create context packages from simple selectors, cached search results, or
|
||||||
|
manifest files.
|
||||||
|
|
||||||
|
## P8.3 - Implement activation lifecycle
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T003
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "9f3d9792-d655-482d-bae0-262df5fc0136"
|
||||||
|
```
|
||||||
|
|
||||||
|
Support activate, deactivate, refresh, and explain operations for a package.
|
||||||
|
|
||||||
|
## P8.4 - Add memory namespaces
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T004
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "2d090494-0e10-44cd-8e2d-c418d7530b27"
|
||||||
|
```
|
||||||
|
|
||||||
|
Support project, user, agent, thread, and task namespaces without hard-coding
|
||||||
|
any external agent platform.
|
||||||
|
|
||||||
|
## P8.5 - Add summary layers
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "4d1cf970-3d6d-4bd5-8da9-ec2399aa7efe"
|
||||||
|
```
|
||||||
|
|
||||||
|
Support deterministic summaries first, then optional LLM-generated summaries
|
||||||
|
through an injected adapter.
|
||||||
|
|
||||||
|
## P8.6 - Add CLI commands
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0008-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "2f18386c-9d2c-4af1-b8e2-75cb487c1692"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt context pack <manifest-or-query>
|
||||||
|
mkt context activate <package-id>
|
||||||
|
mkt context explain <package-id>
|
||||||
|
mkt context refresh <package-id>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exit Criteria
|
||||||
|
|
||||||
|
- Agents can reactivate project context by stable id.
|
||||||
|
- Context packages show included sources and token budgets.
|
||||||
|
- Memory writes remain explicit and inspectable.
|
||||||
105
workplans/MKTT-WP-0009-access-controlled-knowledge-gateway.md
Normal file
105
workplans/MKTT-WP-0009-access-controlled-knowledge-gateway.md
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0009
|
||||||
|
type: workplan
|
||||||
|
title: "Access-Controlled Knowledge Gateway"
|
||||||
|
domain: markitect
|
||||||
|
status: todo
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P2
|
||||||
|
planning_order: 80
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0006
|
||||||
|
created: "2026-05-03"
|
||||||
|
updated: "2026-05-03"
|
||||||
|
state_hub_workstream_id: "f36acbc9-881d-46f2-9181-67de228df0c2"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0009: Access-Controlled Knowledge Gateway
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Add a policy boundary for cached retrieval and context packages so Markitect can
|
||||||
|
support security-sensitive knowledge systems and agent workflows.
|
||||||
|
|
||||||
|
## P9.1 - Define access-control ladder
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "acf240b4-7210-4ee5-90b6-2f2fe1438439"
|
||||||
|
```
|
||||||
|
|
||||||
|
Specify supported modes:
|
||||||
|
|
||||||
|
- labels and trust zones
|
||||||
|
- path/file ACLs
|
||||||
|
- relationship-based access control
|
||||||
|
- attribute/rule-based policies
|
||||||
|
- external policy engines
|
||||||
|
|
||||||
|
## P9.2 - Implement local label policy
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "9eb589d2-82f2-4282-9af0-3958826d397d"
|
||||||
|
```
|
||||||
|
|
||||||
|
Start with local policy labels and diagnostics for denied or redacted results.
|
||||||
|
|
||||||
|
## P9.3 - Add policy-aware query filtering
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "d78ab623-c472-4b24-ad84-08464b574886"
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure results are filtered before leaving the backend boundary. Result
|
||||||
|
metadata must report whether policy filtering occurred.
|
||||||
|
|
||||||
|
## P9.4 - Add relationship policy adapter design
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T004
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "bd4c2b7a-6eac-4845-b5c8-9f9c64946f0c"
|
||||||
|
```
|
||||||
|
|
||||||
|
Design an adapter boundary for Zanzibar/OpenFGA/SpiceDB-style relationship
|
||||||
|
checks without binding the core package to any one service.
|
||||||
|
|
||||||
|
## P9.5 - Add rule policy adapter design
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "752f1962-e83c-44cc-a1c1-0f89a4ea2a90"
|
||||||
|
```
|
||||||
|
|
||||||
|
Design an adapter boundary for OPA/Rego and Cedar-style rule policies.
|
||||||
|
|
||||||
|
## P9.6 - Add decision logs and explainability
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0009-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "990f01fa-5008-4871-a887-1c6ab4375605"
|
||||||
|
```
|
||||||
|
|
||||||
|
Record policy decisions with subject, action, object, context, decision,
|
||||||
|
reason, and provenance.
|
||||||
|
|
||||||
|
## Exit Criteria
|
||||||
|
|
||||||
|
- Local caches can operate in an explicit policy mode.
|
||||||
|
- Query and context package results are policy-aware.
|
||||||
|
- More rigid authorization engines can attach later without replacing the
|
||||||
|
query/cache framework.
|
||||||
Reference in New Issue
Block a user