tegwick c000ce6f73 Wire pdfjs cmaps + standard fonts so text layer positions correctly
Strong likelihood that the "text layer is misplaced / body text not
selectable" symptoms across multiple PDFs come from PDF.js falling
back to substitute font metrics. Without the cmaps directory (CID
character maps for non-Latin fonts) and the standard_fonts directory
(Helvetica/Times/Courier metrics for unembedded standard fonts), the
canvas glyphs use embedded font data while the text-layer span
positions are computed from fallback metrics. The two diverge — text
spans land in the wrong place, or text content can't be decoded at
all, leaving the body unselectable.

Both directories are now copied into the served root by
vite-plugin-static-copy and passed to pdfjs.getDocument() as
`cMapUrl: "/cmaps/"` + `cMapPacked: true` + `standardFontDataUrl:
"/standard_fonts/"` via PdfLoader's `document` prop (which accepts a
full DocumentInitParameters object).

If this is the right diagnosis, the textLayer overlay should now line
up with the visible glyphs on the same PDFs that were producing
fragmented captures. If the body text is still unselectable, the PDF
genuinely lacks a text layer for those glyphs (image-only content)
and OCR would be the only path forward.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:38:34 +02:00
2026-05-24 13:20:45 +00:00
2026-05-26 15:17:33 +02:00

citation-evidence

A document-centered evidence workspace for capturing, managing, presenting, and re-opening citations. The umbrella over the six-package design described in INTENT.md and wiki/ArchitectureOverview.md.

During the MVP all code lives here under src/ (see "Repository layout" below). Sister repos hold INTENT only — code migrates outward when each subsystem stabilises.

Documentation

Where What
INTENT.md Project intent, scope, the umbrella-first decision
wiki/ PRD, Architecture, SharedContracts, DependencyMap
docs/decisions/ ADRs (architecturally significant decisions)
workplans/ Ralph-driven workplans that implement the MVP slice
history/ Time-stamped assessments and post-mortems

The canonical contracts are in wiki/SharedContracts.md; the partition boundaries are in wiki/DependencyMap.md. Both are referenced from every workplan and from each sister repo's INTENT.md.

Repository layout

src/
  shared/   # vocabulary, types, pure helpers      → becomes part of citation-engine
  engine/   # services, repositories, event bus    → becomes part of citation-engine
  anchor/   # selector creation/resolution, viewer adapter contract → becomes evidence-anchor
  source/   # ingest, fingerprint, extraction, recovery → becomes evidence-source
  binder/   # evidence-to-target binding, visual guide → becomes evidence-binder
  work/     # review UI (sidebar, viewer shell)    → becomes citation-work
  app/      # the reference workspace shell        → stays in citation-evidence

The dependency-edge rules between partitions are enforced by ESLint via eslint-plugin-boundaries (see eslint.config.js). Extraction to a sister repo is intended to be a git mv plus a package.json cut — nothing more.

Sister repos

Peers under ~/; each holds INTENT.md only during MVP:

Dev workflow

Requirements: Node 20 LTS (see .nvmrc) and pnpm 9.

pnpm install
pnpm dev        # vite dev server (once src/app/ has a real entry)
pnpm test       # vitest one-shot
pnpm test:watch
pnpm lint       # eslint with boundary rules
pnpm typecheck  # tsc --noEmit
pnpm build      # production bundle

Workplans (Ralph)

Workplans drive incremental implementation through the ralph loop. The harness lives in ~/ralph-workplan/; see workplans/README.md for the active list and ordering.

/ralph-workplan workplans/CE-WP-0001-foundations.md

The loop self-retires when every task in the file has status: done and the workplan's frontmatter status: done.

Description
document-centered evidence workspace for capturing, managing, presenting, and re-opening citations
Readme MIT-0 960 KiB
Languages
TypeScript 98.9%
CSS 0.6%
JavaScript 0.4%