Files
shard-wiki/research/260614-logseq-deep-dive/findings.md
tegwick 6ccf349209 research: Logseq deep dive (block-graph on plain Markdown files, in-file block IDs, derived Datalog index); UC-62/63
Occupies the point the other modern tools leave empty: block-graph
semantics (UUID-addressable, embeddable, queryable blocks) stored as plain
Markdown/Org files on disk, with a DataScript graph derived from the files
(files canonical, index derived). The bridge between Roam (block-DB) and
Obsidian (file-over-app). Headline finding: Logseq resolves the
addressing-spectrum tension — block-level addressing that is also
git-diffable in-file text (id:: property) — and proves a file-backed shard
can serve rich Datalog queries via a derived index. Also: file->SQLite
"DB graph" migration is a live UC-43 (substrate swap under stable
identity); whiteboards = non-Markdown content; dual-attachable (file-store
direct with a Logseq format profile, or in-app plugin). Added UC-62
(attach block-graph-on-plain-files shard), UC-63 (serve structured
queries over a file shard via a derived index shard-wiki builds — converse
of UC-52); enriched UC-32/34/43/50/51/52/55. Catalog now 63 UCs.
Architecture for SHARD-WP-0002 T11/T14/T16: Logseq format profile,
derived-query-index capability, substrate-migration tolerance, in-file
block addressing as the T16 span-address target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 15:27:37 +02:00

14 KiB

Findings — Logseq: block-graph semantics on plain Markdown files

Date: 2026-06-14 Source kind: modern shipped product — an open-source, local-first outliner; a candidate shard occupying the point none of the prior tools do: block-graph + file- over-app Lens: shard-wiki — the convergence of block-level addressing with git-diffable files, a derived query index over canonical files, and a live file→DB storage migration

Why Logseq matters distinctly. The modern-tool dives staked out the corners: Roam = block-graph but client-DB / in-app-API / store-minted UUID; Obsidian = file-over-app but page-level / path-addressed (block ^id opt-in); Notion = block-graph but closed hosted DB / external-API. Logseq sits in the middle they leave empty: block-graph semantics (UUID-addressable, embeddable, queryable blocks) stored as plain Markdown/Org files on disk you own, with a DataScript index derived from those files. It is the system that proves you do not have to choose between Roam's fine-grained addressing and Obsidian's file sovereignty — you can have block-level addressing that is also git-diffable text. That resolves a tension the synthesis flagged (addressing spectrum) and is the reason it earns a memo rather than a footnote.

Contrast set: Roam (block-DB, in-app), Obsidian (file, page-level), Notion (hosted DB), Joplin (SQLite-local, files-on-sync). Logseq = block-graph, files-canonical, index-derived — and now also migrating toward SQLite (§5), a live UC-43 case.


1. Core architecture — files canonical, DataScript derived

  • Storage: plain-text Markdown or Org-mode files on diskpages/<Page>.md and journals/<date>.md — fully usable even if the app disappears (local-first, open source, ClojureScript).
  • Index: a DataScript in-memory graph database, derived from the files: an mldoc parser turns Markdown into an AST, extracts references/properties, and transforms it into DataScript entities with tree-position attributes. Files are the source of truth; the graph DB is a rebuildable projection. (In current versions a DB Worker manages both DataScript and SQLite connections — see §5.)
  • This is the files-canonical / index-derived architecture (Obsidian's MetadataCache, Git's working tree) — but here the derived index is a full Datalog-queryable graph, not just a metadata cache. Logseq is the strongest evidence that a file-backed shard can support rich structured queries via a derived index (UC-52, UC-63).

2. The block-graph, in the file text

Everything is an atomic block (a bullet), individually referenceable, embeddable, and queryable — Roam's model — but the addressing lives in the Markdown:

  • Block IDs are in-file properties. When a block is referenced, Logseq writes an id:: <uuid> property line into the Markdown. So the block's stable address is git-diffable text that survives a file copy, not a DB-minted hidden key. This is the sweet spot of the addressing spectrum: block-level (like Roam) and in-file/ portable (like Obsidian's ^id), without choosing.
  • References: [[page]], ((block-uuid)), #tag — all extracted into the graph; block embeds {{embed ((uuid))}} are transclusion at block granularity (UC-32).
  • Properties: key:: value lines attach typed metadata to blocks (block properties) and pages (first-block/page properties) — git-diffable structured data at block granularity (UC-34), queried by Datalog.
  • Outline tree: a page is a nested tree of blocks (indent = structure), with outliner operations — indent/outdent, move subtree, zoom/narrow into a block. The page is not flat prose; it is an addressable, reorderable block tree (UC-63).

3. Queries — Datalog over the derived graph

Logseq exposes {{query}} (simple) and advanced Datalog queries over the DataScript graph (and via the plugin API, logseq.DB.datascriptQuery). Because the graph is derived from files, query-defined pages (UC-54) and structured aggregation (tasks, tagged blocks) run over a file-backed store. Key lesson: query capability here is neither native-DB (Notion) nor a third-party plugin (Obsidian Dataview) — it is built into the tool over its own files, demonstrating that a derived query index is a viable adapter/orchestrator capability for file shards (UC-63).


4. Extension surface — plugin API + marketplace

Logseq has a JS plugin API (marketplace ~486 plugins, logseq/marketplace):

  • logseq.Editor — block/page CRUD, insert/move/update/delete, get current block/page.
  • logseq.DBdatascriptQuery (Datalog), plus DB change subscriptions.
  • logseq.App — app-level ops/events; logseq.UI; logseq.Git (git ops on the graph); logseq.Assets; logseq.FileStorage.
  • Slash commands, block/page context menus, provideModel/provideStyle, hooks/events.

So, like Obsidian, Logseq is dual-attachable: (a) file-store direct — read the pages/+journals/ Markdown with a Logseq format profile (parse id::, ((uuid)), key::, the outline tree); (b) in-app plugin hostlogseq.Editor write + logseq.DB query + change events for live write-through. A notable extra: a built-in logseq.Git surface — the tool treats git as a first-class companion to the file graph (validating the coordination journal).


5. The file→DB migration — a live UC-43

Logseq is migrating its storage model from Markdown-files to a SQLite "DB graph" (the DB Worker already manages SQLite alongside DataScript; the plugin API has a distinct "DB graph" mode with tags/classes/typed properties). This is a real-world instance of UC-43 (backend-swap under stable identity): the same tool and graph identity moving from a file substrate to a DB substrate, trading git-diffability for richer typed structure (toward the Notion/XWiki end). For shard-wiki it is both a caution (a shard's substrate can change under it) and a confirmation that the addressing/structure/history spectra are trajectories tools actually travel — an adapter keyed to a fixed substrate will break.


6. Logseq as a shard — capability profile

Capability Logseq (MD-file graph) Notes for the adapter contract
Read yes file-store direct (pages/journals MD) or logseq.Editor/logseq.DB in-app
Write yes direct file write (format-aware) or plugin; block-level
Write granularity per-block (in a per-file page) finer than Obsidian (per-file), like Roam — but on files
Identity / addressing in-file block id:: uuid + ((uuid)) block-level and git-diffable — the addressing sweet spot (UC-51)
Structure key:: value block/page properties; outline tree git-diffable structured data at block granularity (UC-34)
History none native; logseq.Git + files = git-native git-friendly out of the box (adopt, not supplement)
Native query Datalog over derived DataScript derived index over files → delegate or rebuild (UC-52, UC-63)
Transclusion block embeds {{embed ((uuid))}} in-file, addressable (UC-32)
Backlinks linked + unlinked references derived (UC-05/18)
Content types Markdown/Org + whiteboards (tldraw JSON) non-Markdown spatial content (UC-55)
Substrate files now, SQLite "DB graph" emerging live backend-swap (UC-43)
Attach modes file-store direct (format profile) · in-app plugin dual, per-binding (UC-62)

Verdict: the most shard-wiki-friendly block tool — block-graph power with file-over- app sovereignty and git-diffable addressing/structure. Best attach: file-store direct with a Logseq format profile (offline, git-native), with the plugin for live write- through. Watch the DB-graph migration (UC-43).


7. Mapping to shard-wiki INTENT (compare, do not equate)

7.1 Reinforcements

  • Resolves the addressing tension. Block-level addressing can be git-diffable in-file (id::), not forced to choose Roam-DB vs Obsidian-page (UC-51, synthesis §2).
  • Confirms files-canonical / index-derived at full power — a Datalog graph derived from files, not just a metadata cache (UC-52, UC-63).
  • Structure as git-diffable text (key::) reinforces "prefer in-text structure" (UC-34, synthesis through-line).
  • Outliner block tree validates the page-as-addressable-block-tree demand (UC-50, UC-63) on a file backend.

7.2 Deliberate divergences (design bugs if conflated)

  1. One graph = one shard. Local-first single graph; not the federation layer.
  2. The MD files carry Logseq-specific syntax (id::, ((uuid)), key::, outline semantics) — a format profile, not plain CommonMark. A naïve Markdown reader will mangle block IDs/properties (cf. UC-42; lighter than Notion's lossy case).
  3. The substrate is moving (file→SQLite). Don't hard-code the file model; gate on the substrate capability and tolerate the swap (UC-43).
  4. Whiteboards are not Markdown — typed/opaque assets, not flattened (UC-55).

7.3 What Logseq teaches that shard-wiki should keep

  • In-file block addressing is the target shape for a portable span address where the backend cooperates — adopt id::-style schemes; they are git-diffable and survive copy.
  • A derived query index over files is a first-class capability — shard-wiki can build one over a file shard (UC-63) when the backend exposes none, or delegate when it does.
  • Expect substrate migration — bind to capabilities, not to "it's files."

8. Use-case seeds → catalog (promoted 2026-06-14)

Last existing UC is UC-61. New UCs UC-62, UC-63 added; existing UCs enriched.

Seed Catalog action
Attach a block-graph-on-plain-files shard (Logseq-style): file-over-app Markdown carrying in-file block IDs (id::), block refs, and key:: properties, with the outline tree and a derivable query index UC-62 (new)
Serve structured queries over a file-backed shard via a derived index the orchestrator/adapter builds (Logseq DataScript-over-files) when the backend exposes none UC-63 (new)
In-file block id:: = block-level and git-diffable addressing (the spectrum sweet spot) enriches UC-51
Live file→SQLite substrate migration under stable graph identity enriches UC-43
Block-graph that is files, not a DB — the file-store variant of the block tool family enriches UC-50
Datalog over a derived index built from files enriches UC-52
key:: value block/page properties in-text enriches UC-34
Whiteboards (tldraw JSON) = non-Markdown content enriches UC-55
Block embeds {{embed ((uuid))}} = in-file transclusion links UC-32

9. Architecture notes for SHARD-WP-0002 (no UC)

  • Format-aware file-store profiles (already flagged by Joplin, UC-60) gain a strong case here: a Logseq profile parses id::/((uuid))/key::/outline from otherwise- plain Markdown. The contract should let a file-store adapter declare its format profile (plain MD / Obsidian / Logseq / Joplin-item / Foswiki-PlainFile). (T11/T14.)
  • Derived-query-index as an orchestrator capability (UC-63): when a file shard has no native query engine, build a DataScript-like index over the projection; when it does (Roam/Notion/XWiki), delegate (UC-52). Both belong in T16's navigation layer + T11.
  • Substrate-migration tolerance (UC-43, Logseq file→SQLite): T14 binding should treat substrate as a capability that can change under a live attachment, preserving identity.
  • In-file block addressing is the concrete realization of T16's span-address thread — prefer id::-style git-diffable IDs over DB-minted where the backend allows.

10. Open questions (for spec / workplans)

  1. When the same tool offers file and DB substrates (Logseq now), does shard-wiki prefer the file graph (git-diffable, UC-62) or the DB graph (richer typed structure), and can one binding follow the migration (UC-43)?
  2. Is the derived query index (UC-63) built by the adapter (per-shard) or the core orchestrator (over the union), and is it persisted or rebuilt?
  3. How much Logseq outline semantics (zoom, subtree move) must shard-wiki preserve vs. present as a flat page (UC-63 vs. Markdown-first page model)?
  4. Does the Logseq format profile round-trip overlays (write id::/key:: back) or only read them? (cf. UC-42 round-trip question.)

11. Sources

Source Used for
DeepWiki — logseq/logseq (https://deepwiki.com/logseq/logseq) DataScript+Datalog graph, mldoc AST parse, block entities/tree, DB Worker managing DataScript and SQLite
logseq/docs (https://deepwiki.com/logseq/docs) Plain-text MD/Org files; pages/journals; references ( , (( )), #tag); properties
Logseq Plugin API docs (https://plugins-doc.logseq.com/) logseq.App/Editor/DB/Git/UI/Assets/FileStorage; DB.datascriptQuery
logseq/marketplace (https://github.com/logseq/marketplace) Plugin distribution; ~486 plugins
kerim/logseq-db-plugin-api-skill (https://github.com/kerim/logseq-db-plugin-api-skill) DB-graph version: tags/classes, typed properties, file→DB migration
pangea.app glossary — Logseq (https://pangea.app/glossary/logseq) Local-first framing, outliner, plain-text control

Cross-references: research/260614-roam-deep-dive/findings.md (block-DB contrast), research/260614-obsidian-deep-dive/findings.md (file-over-app, derived index), research/260614-joplin-deep-dive/findings.md (format-aware file-store profiles), research/260614-shard-spectrum-synthesis/findings.md (addressing spectrum this resolves), spec/UseCaseCatalog.md (UC-32, UC-34, UC-43, UC-50, UC-51, UC-52, UC-55), workplans/SHARD-WP-0002-federation-architecture.md (T11, T14, T16).


12. Traceability

  • New UCs: UC-62, UC-63spec/UseCaseCatalog.md.
  • Enriched UCs: UC-32, UC-34, UC-43, UC-50, UC-51, UC-52, UC-55.
  • Architecture (no UC): format-aware file-store profiles (Logseq profile); derived-query- index capability; substrate-migration tolerance; in-file block addressing as the T16 span-address target → SHARD-WP-0002 (T11, T14, T16).
  • Boundary recorded: Logseq is one block-graph-on-files candidate shard (the addressing sweet spot), best attached file-store-direct with a format profile; not the federation layer; substrate is migrating file→SQLite (UC-43).