Files
markitect-tool/docs/content-references.md

140 lines
3.7 KiB
Markdown

# Content References
Date: 2026-05-04
## Purpose
Content references are the first WP-0010 extension layer. They give Markitect a
shared way to name and resolve Markdown content units without changing the
existing parser, query, transform, compose, include, contract, or cache APIs.
The goal is a small resolver that later features can reuse:
- includes can accept references as well as paths
- explode/implode can write manifests with stable unit IDs
- processors can receive typed units and dependency edges
- tangle/weave can address chunks and generated outputs
- cache and access-control backends can index the same IDs
## Reference Syntax
References are compact strings:
```text
path/to/file.md
path/to/file.md#section:introduction
path/to/file.md::sections[heading=Decision]
std:clauses/payment.md
std:clauses/payment.md#payment-terms
std:clauses/payment.md#region:boilerplate
std:clauses/payment.md#tag:legal
#local-section
```
The parts are:
- `namespace:`: optional namespace declared in frontmatter
- `path`: a Markdown file path relative to the current document, or relative to
the namespace target
- `#fragment`: optional unit lookup inside the target document
- `::selector`: optional existing Markitect query selector
Fragments and selectors are mutually exclusive during resolution. Selectors are
delegated to the existing query engine, which keeps this layer small and avoids
inventing a second query language.
## Namespaces
Namespaces live in Markdown frontmatter:
```yaml
---
namespaces:
std: ./standard
product: ../product-docs
---
```
Namespace keys may be written with or without a trailing colon. Namespace values
are string paths. Relative namespace paths resolve under the resolver root. All
resolved file paths must stay inside that root.
## Content Units
The resolver currently emits these unit kinds:
- `document`: full Markdown file
- `section`: heading-led Markdown section
- `heading`: heading line
- existing query kinds such as `frontmatter`, `block`, `metrics`, or `section`
Each unit includes:
- `unit_id`: stable local ID
- `kind`
- `source_path`
- source line span when available
- `name`
- `content_hash`
- raw text
- metadata from the source or query match
Heading and section IDs use an explicit trailing heading ID when present:
```markdown
## Payment Terms {#payment-terms}
```
Otherwise the resolver derives a slug from the heading text and adds numeric
suffixes for collisions.
Named regions use HTML comments so they can live in Markdown and many source
files without changing the rendered document:
```markdown
<!-- mkt:region id="boilerplate" tags="legal reuse" -->
Reusable text.
<!-- /mkt:region -->
```
Fenced blocks can be addressed when their info string includes an ID:
````markdown
```python {#load-config tags="code setup" tangle="src/config.py"}
def load_config():
return {}
```
````
Supported fragments now include:
- `#section:<id-or-heading-slug>`
- `#heading:<id-or-heading-slug>`
- `#region:<id>`
- `#fence:<id>`
- `#tag:<tag>`
- `#line:<start>` or `#line:<start>-<end>`
- `#<id>` as a convenience lookup across sections, regions, fenced blocks, and
headings
## CLI
Resolve a reference from a context document:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md#payment-terms'
```
JSON and YAML formats include the resolved text and metadata:
```bash
mkt ref resolve examples/references/context.md 'std:clauses.md::sections[heading=Warranty]' --format json
```
## Extension Boundary
This layer is intentionally read-only. It does not replace `mkt include`,
`mkt query`, or `mkt extract`. Instead it defines the address model those tools
can adopt when their next WP-0010 tasks require richer content identity,
processor dependencies, source maps, and reversible manifests.