generated from coulomb/repo-seed
research: Logseq deep dive (block-graph on plain Markdown files, in-file block IDs, derived Datalog index); UC-62/63
Occupies the point the other modern tools leave empty: block-graph semantics (UUID-addressable, embeddable, queryable blocks) stored as plain Markdown/Org files on disk, with a DataScript graph derived from the files (files canonical, index derived). The bridge between Roam (block-DB) and Obsidian (file-over-app). Headline finding: Logseq resolves the addressing-spectrum tension — block-level addressing that is also git-diffable in-file text (id:: property) — and proves a file-backed shard can serve rich Datalog queries via a derived index. Also: file->SQLite "DB graph" migration is a live UC-43 (substrate swap under stable identity); whiteboards = non-Markdown content; dual-attachable (file-store direct with a Logseq format profile, or in-app plugin). Added UC-62 (attach block-graph-on-plain-files shard), UC-63 (serve structured queries over a file shard via a derived index shard-wiki builds — converse of UC-52); enriched UC-32/34/43/50/51/52/55. Catalog now 63 UCs. Architecture for SHARD-WP-0002 T11/T14/T16: Logseq format profile, derived-query-index capability, substrate-migration tolerance, in-file block addressing as the T16 span-address target. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
50
research/260614-logseq-deep-dive/README.md
Normal file
50
research/260614-logseq-deep-dive/README.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# 260614 — Logseq deep dive (block-graph semantics on plain Markdown files)
|
||||
|
||||
Date: 2026-06-14
|
||||
|
||||
## What this is
|
||||
|
||||
A focused study of **Logseq** — the open-source, local-first outliner — read through
|
||||
shard-wiki's lens. Logseq occupies the point the other modern tools leave empty:
|
||||
**block-graph semantics (UUID-addressable, embeddable, queryable blocks) stored as plain
|
||||
Markdown/Org files on disk**, with a DataScript graph **derived** from those files. It is
|
||||
the bridge between **Roam** (block-graph, client-DB) and **Obsidian** (file-over-app,
|
||||
page-level), and it resolves a tension the synthesis flagged: block-level addressing
|
||||
*that is also git-diffable text*.
|
||||
|
||||
Distinctive material:
|
||||
- **Architecture** — files canonical (`pages/`, `journals/` MD/Org); a DataScript graph
|
||||
**derived** via an `mldoc` AST parse; a DB Worker now manages **both DataScript and
|
||||
SQLite** (the file→DB migration)
|
||||
- **Block-graph in the file text** — block IDs as in-file `id:: <uuid>` properties,
|
||||
`((uuid))` refs, `key:: value` properties, `{{embed}}` transclusion, the outline tree
|
||||
(indent/zoom/move-subtree) — all git-diffable
|
||||
- **Queries** — `{{query}}` + advanced **Datalog** over the derived graph
|
||||
(`logseq.DB.datascriptQuery`)
|
||||
- **Extension** — plugin API (`logseq.App/Editor/DB/Git/UI/Assets/FileStorage`),
|
||||
marketplace (~486 plugins); dual-attachable (file-store direct *or* in-app plugin)
|
||||
- **Trajectory** — migrating from Markdown-files to a SQLite "DB graph" (a live UC-43)
|
||||
|
||||
## Contents
|
||||
|
||||
| Path | Role |
|
||||
|------|------|
|
||||
| `findings.md` | Architecture, in-file block-graph, Datalog queries, plugin API, file→DB migration, capability profile, INTENT mapping, UC seeds, architecture notes, sources |
|
||||
|
||||
## Status
|
||||
|
||||
Initial deep dive complete. Two new use cases promoted to `spec/UseCaseCatalog.md`
|
||||
(UC-62 attach a block-graph-on-plain-files shard with in-file block IDs/properties +
|
||||
outline tree + derivable query index; UC-63 serve structured queries over a file-backed
|
||||
shard via a derived index the orchestrator/adapter builds); UC-32/34/43/50/51/52/55
|
||||
enriched. Logged for `SHARD-WP-0002` (T11/T14/T16): a Logseq **format profile** for
|
||||
file-store adapters, a **derived-query-index** capability, **substrate-migration
|
||||
tolerance**, and **in-file block addressing** as the concrete T16 span-address target.
|
||||
|
||||
**Key takeaway recorded:** Logseq resolves the addressing-spectrum tension — block-level
|
||||
addressing **and** git-diffable in-file text (`id::`) — and proves a file-backed shard
|
||||
can support rich Datalog queries via a **derived** index (files canonical, graph
|
||||
derived). **Boundary:** one block-graph-on-files candidate shard (the addressing sweet
|
||||
spot), best attached file-store-direct with a format profile; not the federation layer;
|
||||
substrate is migrating file→SQLite.
|
||||
</content>
|
||||
246
research/260614-logseq-deep-dive/findings.md
Normal file
246
research/260614-logseq-deep-dive/findings.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Findings — Logseq: block-graph semantics on plain Markdown files
|
||||
|
||||
Date: 2026-06-14
|
||||
Source kind: **modern shipped product** — an open-source, local-first outliner; a
|
||||
*candidate shard* occupying the point none of the prior tools do: **block-graph + file-
|
||||
over-app**
|
||||
Lens: shard-wiki — the convergence of block-level addressing with git-diffable files, a
|
||||
derived query index over canonical files, and a live file→DB storage migration
|
||||
|
||||
> Why Logseq matters distinctly. The modern-tool dives staked out the corners: **Roam**
|
||||
> = block-graph but client-DB / in-app-API / store-minted UUID; **Obsidian** =
|
||||
> file-over-app but page-level / path-addressed (block `^id` opt-in); **Notion** =
|
||||
> block-graph but closed hosted DB / external-API. **Logseq sits in the middle they
|
||||
> leave empty: block-graph *semantics* (UUID-addressable, embeddable, queryable blocks)
|
||||
> stored as *plain Markdown/Org files on disk you own*, with a DataScript index
|
||||
> **derived** from those files.** It is the system that proves you do not have to choose
|
||||
> between Roam's fine-grained addressing and Obsidian's file sovereignty — you can have
|
||||
> block-level addressing *that is also git-diffable text*. That resolves a tension the
|
||||
> synthesis flagged (addressing spectrum) and is the reason it earns a memo rather than
|
||||
> a footnote.
|
||||
|
||||
Contrast set: Roam (block-DB, in-app), Obsidian (file, page-level), Notion (hosted DB),
|
||||
Joplin (SQLite-local, files-on-sync). Logseq = **block-graph, files-canonical,
|
||||
index-derived** — and now *also* migrating toward SQLite (§5), a live UC-43 case.
|
||||
|
||||
---
|
||||
|
||||
## 1. Core architecture — files canonical, DataScript derived
|
||||
|
||||
- **Storage:** plain-text **Markdown or Org-mode files on disk** — `pages/<Page>.md` and
|
||||
`journals/<date>.md` — fully usable even if the app disappears (local-first, open
|
||||
source, ClojureScript).
|
||||
- **Index:** a **DataScript** in-memory graph database, **derived** from the files: an
|
||||
`mldoc` parser turns Markdown into an AST, extracts references/properties, and
|
||||
transforms it into **DataScript entities** with tree-position attributes. Files are the
|
||||
source of truth; the graph DB is a rebuildable projection. (In current versions a **DB
|
||||
Worker** manages **both DataScript and SQLite** connections — see §5.)
|
||||
- This is the **files-canonical / index-derived** architecture (Obsidian's MetadataCache,
|
||||
Git's working tree) — but here the derived index is a *full Datalog-queryable graph*,
|
||||
not just a metadata cache. Logseq is the strongest evidence that **a file-backed shard
|
||||
can support rich structured queries via a derived index** (UC-52, UC-63).
|
||||
|
||||
---
|
||||
|
||||
## 2. The block-graph, in the file text
|
||||
|
||||
Everything is an **atomic block** (a bullet), individually referenceable, embeddable, and
|
||||
queryable — Roam's model — but the addressing lives **in the Markdown**:
|
||||
|
||||
- **Block IDs are in-file properties.** When a block is referenced, Logseq writes an
|
||||
`id:: <uuid>` property line into the Markdown. So the block's stable address is
|
||||
**git-diffable text that survives a file copy**, not a DB-minted hidden key. This is the
|
||||
**sweet spot of the addressing spectrum**: block-level (like Roam) *and* in-file/
|
||||
portable (like Obsidian's `^id`), without choosing.
|
||||
- **References:** `[[page]]`, **`((block-uuid))`**, `#tag` — all extracted into the graph;
|
||||
block embeds `{{embed ((uuid))}}` are transclusion at block granularity (UC-32).
|
||||
- **Properties:** `key:: value` lines attach typed metadata to blocks (block properties)
|
||||
and pages (first-block/page properties) — **git-diffable structured data at block
|
||||
granularity** (UC-34), queried by Datalog.
|
||||
- **Outline tree:** a page is a **nested tree of blocks** (indent = structure), with
|
||||
outliner operations — indent/outdent, move subtree, **zoom/narrow** into a block. The
|
||||
page is not flat prose; it is an addressable, reorderable block tree (UC-63).
|
||||
|
||||
---
|
||||
|
||||
## 3. Queries — Datalog over the derived graph
|
||||
|
||||
Logseq exposes **`{{query}}`** (simple) and **advanced Datalog queries** over the
|
||||
DataScript graph (and via the plugin API, `logseq.DB.datascriptQuery`). Because the graph
|
||||
is derived from files, **query-defined pages** (UC-54) and structured aggregation (tasks,
|
||||
tagged blocks) run over a *file-backed* store. Key lesson: query capability here is
|
||||
**neither native-DB (Notion) nor a third-party plugin (Obsidian Dataview)** — it is
|
||||
**built into the tool over its own files**, demonstrating that *a derived query index is a
|
||||
viable adapter/orchestrator capability for file shards* (UC-63).
|
||||
|
||||
---
|
||||
|
||||
## 4. Extension surface — plugin API + marketplace
|
||||
|
||||
Logseq has a JS **plugin API** (marketplace ~486 plugins, `logseq/marketplace`):
|
||||
|
||||
- `logseq.Editor` — block/page CRUD, insert/move/update/delete, get current block/page.
|
||||
- `logseq.DB` — **`datascriptQuery`** (Datalog), plus DB change subscriptions.
|
||||
- `logseq.App` — app-level ops/events; `logseq.UI`; `logseq.Git` (git ops on the graph);
|
||||
`logseq.Assets`; `logseq.FileStorage`.
|
||||
- Slash commands, block/page context menus, `provideModel`/`provideStyle`, hooks/events.
|
||||
|
||||
So, like Obsidian, Logseq is **dual-attachable**: (a) **file-store direct** — read the
|
||||
`pages/`+`journals/` Markdown with a **Logseq format profile** (parse `id::`, `((uuid))`,
|
||||
`key::`, the outline tree); (b) **in-app plugin host** — `logseq.Editor` write +
|
||||
`logseq.DB` query + change events for live write-through. A notable extra: a built-in
|
||||
**`logseq.Git`** surface — the tool treats git as a first-class companion to the file
|
||||
graph (validating the coordination journal).
|
||||
|
||||
---
|
||||
|
||||
## 5. The file→DB migration — a live UC-43
|
||||
|
||||
Logseq is **migrating its storage model from Markdown-files to a SQLite "DB graph"**
|
||||
(the DB Worker already manages SQLite alongside DataScript; the plugin API has a distinct
|
||||
"DB graph" mode with tags/classes/typed properties). This is a real-world instance of
|
||||
**UC-43 (backend-swap under stable identity)**: the *same tool and graph identity* moving
|
||||
from a file substrate to a DB substrate, trading git-diffability for richer typed
|
||||
structure (toward the Notion/XWiki end). For shard-wiki it is both a caution (a shard's
|
||||
substrate can change under it) and a confirmation that the **addressing/structure/history
|
||||
spectra** are trajectories tools actually travel — an adapter keyed to a fixed substrate
|
||||
will break.
|
||||
|
||||
---
|
||||
|
||||
## 6. Logseq as a shard — capability profile
|
||||
|
||||
| Capability | Logseq (MD-file graph) | Notes for the adapter contract |
|
||||
|------------|------------------------|--------------------------------|
|
||||
| Read | **yes** | file-store direct (pages/journals MD) or `logseq.Editor`/`logseq.DB` in-app |
|
||||
| Write | **yes** | direct file write (format-aware) or plugin; block-level |
|
||||
| Write granularity | **per-block** (in a per-file page) | finer than Obsidian (per-file), like Roam — but on files |
|
||||
| Identity / addressing | **in-file block `id:: uuid` + `((uuid))`** | block-level **and** git-diffable — the addressing sweet spot (UC-51) |
|
||||
| Structure | `key:: value` block/page properties; outline tree | git-diffable structured data at block granularity (UC-34) |
|
||||
| History | none native; **`logseq.Git`** + files = git-native | git-friendly out of the box (adopt, not supplement) |
|
||||
| Native query | **Datalog over derived DataScript** | derived index over files → delegate or rebuild (UC-52, UC-63) |
|
||||
| Transclusion | **block embeds `{{embed ((uuid))}}`** | in-file, addressable (UC-32) |
|
||||
| Backlinks | linked + unlinked references | derived (UC-05/18) |
|
||||
| Content types | Markdown/Org + **whiteboards** (tldraw JSON) | non-Markdown spatial content (UC-55) |
|
||||
| Substrate | **files now, SQLite "DB graph" emerging** | live backend-swap (UC-43) |
|
||||
| Attach modes | file-store direct (format profile) · in-app plugin | dual, per-binding (UC-62) |
|
||||
|
||||
Verdict: **the most shard-wiki-friendly block tool** — block-graph power with file-over-
|
||||
app sovereignty and git-diffable addressing/structure. Best attach: **file-store direct
|
||||
with a Logseq format profile** (offline, git-native), with the plugin for live write-
|
||||
through. Watch the **DB-graph migration** (UC-43).
|
||||
|
||||
---
|
||||
|
||||
## 7. Mapping to shard-wiki INTENT (compare, do not equate)
|
||||
|
||||
### 7.1 Reinforcements
|
||||
|
||||
- **Resolves the addressing tension.** Block-level addressing *can* be git-diffable
|
||||
in-file (`id::`), not forced to choose Roam-DB vs Obsidian-page (UC-51, synthesis §2).
|
||||
- **Confirms files-canonical / index-derived at full power** — a Datalog graph derived
|
||||
from files, not just a metadata cache (UC-52, UC-63).
|
||||
- **Structure as git-diffable text** (`key::`) reinforces "prefer in-text structure"
|
||||
(UC-34, synthesis through-line).
|
||||
- **Outliner block tree** validates the page-as-addressable-block-tree demand (UC-50,
|
||||
UC-63) on a *file* backend.
|
||||
|
||||
### 7.2 Deliberate divergences (design bugs if conflated)
|
||||
|
||||
1. **One graph = one shard.** Local-first single graph; not the federation layer.
|
||||
2. **The MD files carry Logseq-specific syntax** (`id::`, `((uuid))`, `key::`, outline
|
||||
semantics) — a **format profile**, not plain CommonMark. A naïve Markdown reader will
|
||||
mangle block IDs/properties (cf. UC-42; lighter than Notion's lossy case).
|
||||
3. **The substrate is moving (file→SQLite).** Don't hard-code the file model; gate on the
|
||||
substrate capability and tolerate the swap (UC-43).
|
||||
4. **Whiteboards are not Markdown** — typed/opaque assets, not flattened (UC-55).
|
||||
|
||||
### 7.3 What Logseq teaches that shard-wiki should keep
|
||||
|
||||
- **In-file block addressing is the target shape** for a portable span address where the
|
||||
backend cooperates — adopt `id::`-style schemes; they are git-diffable and survive copy.
|
||||
- **A derived query index over files is a first-class capability** — shard-wiki can build
|
||||
one over a file shard (UC-63) when the backend exposes none, or delegate when it does.
|
||||
- **Expect substrate migration** — bind to capabilities, not to "it's files."
|
||||
|
||||
---
|
||||
|
||||
## 8. Use-case seeds → catalog (promoted 2026-06-14)
|
||||
|
||||
Last existing UC is **UC-61**. New UCs **UC-62, UC-63** added; existing UCs enriched.
|
||||
|
||||
| Seed | Catalog action |
|
||||
|------|----------------|
|
||||
| **Attach a block-graph-on-plain-files shard** (Logseq-style): file-over-app Markdown carrying in-file block IDs (`id::`), block refs, and `key::` properties, with the outline tree and a derivable query index | **UC-62 (new)** |
|
||||
| **Serve structured queries over a file-backed shard via a derived index** the orchestrator/adapter builds (Logseq DataScript-over-files) when the backend exposes none | **UC-63 (new)** |
|
||||
| In-file block `id::` = block-level **and** git-diffable addressing (the spectrum sweet spot) | **enriches UC-51** |
|
||||
| Live **file→SQLite substrate migration** under stable graph identity | **enriches UC-43** |
|
||||
| Block-graph that is **files, not a DB** — the file-store variant of the block tool family | **enriches UC-50** |
|
||||
| Datalog over a **derived** index built from files | **enriches UC-52** |
|
||||
| `key:: value` block/page properties in-text | **enriches UC-34** |
|
||||
| Whiteboards (tldraw JSON) = non-Markdown content | **enriches UC-55** |
|
||||
| Block embeds `{{embed ((uuid))}}` = in-file transclusion | links UC-32 |
|
||||
|
||||
---
|
||||
|
||||
## 9. Architecture notes for SHARD-WP-0002 (no UC)
|
||||
|
||||
- **Format-aware file-store profiles** (already flagged by Joplin, UC-60) gain a strong
|
||||
case here: a Logseq profile parses `id::`/`((uuid))`/`key::`/outline from otherwise-
|
||||
plain Markdown. The contract should let a file-store adapter declare its **format
|
||||
profile** (plain MD / Obsidian / Logseq / Joplin-item / Foswiki-PlainFile). (T11/T14.)
|
||||
- **Derived-query-index as an orchestrator capability** (UC-63): when a file shard has no
|
||||
native query engine, build a DataScript-like index over the projection; when it does
|
||||
(Roam/Notion/XWiki), delegate (UC-52). Both belong in T16's navigation layer + T11.
|
||||
- **Substrate-migration tolerance** (UC-43, Logseq file→SQLite): T14 binding should treat
|
||||
substrate as a capability that can change under a live attachment, preserving identity.
|
||||
- **In-file block addressing** is the concrete realization of T16's span-address thread —
|
||||
prefer `id::`-style git-diffable IDs over DB-minted where the backend allows.
|
||||
|
||||
---
|
||||
|
||||
## 10. Open questions (for spec / workplans)
|
||||
|
||||
1. When the same tool offers **file** and **DB** substrates (Logseq now), does shard-wiki
|
||||
prefer the file graph (git-diffable, UC-62) or the DB graph (richer typed structure),
|
||||
and can one binding follow the migration (UC-43)?
|
||||
2. Is the **derived query index** (UC-63) built by the **adapter** (per-shard) or the
|
||||
**core orchestrator** (over the union), and is it persisted or rebuilt?
|
||||
3. How much **Logseq outline semantics** (zoom, subtree move) must shard-wiki preserve vs.
|
||||
present as a flat page (UC-63 vs. Markdown-first page model)?
|
||||
4. Does the **Logseq format profile** round-trip overlays (write `id::`/`key::` back) or
|
||||
only read them? (cf. UC-42 round-trip question.)
|
||||
|
||||
---
|
||||
|
||||
## 11. Sources
|
||||
|
||||
| Source | Used for |
|
||||
|--------|----------|
|
||||
| DeepWiki — logseq/logseq (https://deepwiki.com/logseq/logseq) | DataScript+Datalog graph, mldoc AST parse, block entities/tree, DB Worker managing DataScript **and** SQLite |
|
||||
| logseq/docs (https://deepwiki.com/logseq/docs) | Plain-text MD/Org files; pages/journals; references ([[ ]], (( )), #tag); properties |
|
||||
| Logseq Plugin API docs (https://plugins-doc.logseq.com/) | `logseq.App/Editor/DB/Git/UI/Assets/FileStorage`; `DB.datascriptQuery` |
|
||||
| logseq/marketplace (https://github.com/logseq/marketplace) | Plugin distribution; ~486 plugins |
|
||||
| kerim/logseq-db-plugin-api-skill (https://github.com/kerim/logseq-db-plugin-api-skill) | DB-graph version: tags/classes, typed properties, file→DB migration |
|
||||
| pangea.app glossary — Logseq (https://pangea.app/glossary/logseq) | Local-first framing, outliner, plain-text control |
|
||||
|
||||
Cross-references: `research/260614-roam-deep-dive/findings.md` (block-DB contrast),
|
||||
`research/260614-obsidian-deep-dive/findings.md` (file-over-app, derived index),
|
||||
`research/260614-joplin-deep-dive/findings.md` (format-aware file-store profiles),
|
||||
`research/260614-shard-spectrum-synthesis/findings.md` (addressing spectrum this
|
||||
resolves), `spec/UseCaseCatalog.md` (UC-32, UC-34, UC-43, UC-50, UC-51, UC-52, UC-55),
|
||||
`workplans/SHARD-WP-0002-federation-architecture.md` (T11, T14, T16).
|
||||
|
||||
---
|
||||
|
||||
## 12. Traceability
|
||||
|
||||
- New UCs: **UC-62, UC-63** → `spec/UseCaseCatalog.md`.
|
||||
- Enriched UCs: **UC-32, UC-34, UC-43, UC-50, UC-51, UC-52, UC-55**.
|
||||
- Architecture (no UC): format-aware file-store profiles (Logseq profile); derived-query-
|
||||
index capability; substrate-migration tolerance; in-file block addressing as the T16
|
||||
span-address target → `SHARD-WP-0002` (T11, T14, T16).
|
||||
- Boundary recorded: Logseq is **one block-graph-on-files candidate shard** (the
|
||||
addressing sweet spot), best attached file-store-direct with a format profile; not the
|
||||
federation layer; substrate is migrating file→SQLite (UC-43).
|
||||
</content>
|
||||
@@ -23,4 +23,5 @@ when multiple files or sources are involved. Findings here inform `spec/` and
|
||||
| 2026-06-14 | `260614-obsidian-deep-dive/` | Obsidian — file-over-app vaults, plugin API, ecosystem-popularity → UC signal; UC-53/54/55/56 |
|
||||
| 2026-06-14 | `260614-notion-deep-dive/` | Notion — closed block-DB SaaS, external REST API only, database-as-pages; UC-57/58/59 |
|
||||
| 2026-06-14 | `260614-shard-spectrum-synthesis/` | Synthesis — shard family matrix + eleven capability spectra across nine systems; feeds SHARD-WP-0002 T11–T16 |
|
||||
| 2026-06-14 | `260614-joplin-deep-dive/` | Joplin — SQLite-local/Markdown-on-sync, interchange-format attach, E2EE content opacity; UC-60/61 |
|
||||
| 2026-06-14 | `260614-joplin-deep-dive/` | Joplin — SQLite-local/Markdown-on-sync, interchange-format attach, E2EE content opacity; UC-60/61 |
|
||||
| 2026-06-14 | `260614-logseq-deep-dive/` | Logseq — block-graph on plain Markdown files, in-file block IDs, derived Datalog index; UC-62/63 |
|
||||
Reference in New Issue
Block a user