generated from coulomb/repo-seed
Add INTENT.md/SCOPE.md, reconcile PRD scope, rename content fingerprint
- Add INTENT.md (purpose and inviolable principles) and SCOPE.md (current operational boundary), matching the binect-js house style. - Reconcile the PRD with the shipped document-lifecycle scope: add ordering/server-sync requirements (4.3a), split the proxy queue vs. tracking-log caps (4.6.3), and update the solution summary/closing. - Rename computeMD5 -> computeContentFingerprint to be honest: it is a fast sampled non-cryptographic fingerprint for dedup, not MD5. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
77
INTENT.md
Normal file
77
INTENT.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# INTENT.md
|
||||
|
||||
> The purpose of this document is to capture **why BinectChrome exists and what it must remain**, independent of any specific line of code. Where the [README](README.md) explains *how to use* the extension and [CLAUDE.md](CLAUDE.md) explains *how the code is structured*, this file records the **intent** that all of those serve. The full requirements live in [`specs/ProductRequirementsDocument.md`](specs/ProductRequirementsDocument.md); the concrete delivered surface lives in [`SCOPE.md`](SCOPE.md). If a future change conflicts with what is written here, the change is suspect — not the intent.
|
||||
|
||||
## 1. Core Intent
|
||||
|
||||
> **Let a user send a PDF produced in *any* cloud application directly to Binect for physical mail — from the browser where they already work, with explicit intent, and without the extension ever holding their documents.**
|
||||
|
||||
BinectChrome is a thin, trustworthy bridge between the browser and the [Binect API](https://app.binect.de/index.jsp?id=api). It collapses the manual **download → re-upload** loop into a single deliberate click, and it does so without asking the source application to change anything.
|
||||
|
||||
## 2. What Problem This Solves
|
||||
|
||||
Users routinely generate PDFs — letters, invoices, notices — in cloud applications that have no connection to a postal-mail service. Today, sending one physically means: download it from application A, upload it to application B (Binect), repeat for every document. This is slow, error-prone, and discouraging at volume.
|
||||
|
||||
BinectChrome targets exactly that friction and nothing beyond it: it notices the PDF the user just produced and offers to send it onward for printing and delivery.
|
||||
|
||||
## 3. The Core User Journey
|
||||
|
||||
This is the path that must always work end-to-end. Everything else is secondary.
|
||||
|
||||
1. The user generates or downloads a PDF in some web application.
|
||||
2. The extension **detects** it and surfaces it — a toolbar badge and a popup entry showing filename, size, source domain, and time.
|
||||
3. The user opens the popup, reviews the document, and clicks **Send to Binect**.
|
||||
4. The extension **re-acquires the PDF bytes** (re-fetching from the original URL using the user's session) and **uploads** them to Binect via the official API, showing unambiguous Uploading → Success / Failure states.
|
||||
5. The transfer is **recorded locally** for transparency; the user can review history, follow the document through its Binect lifecycle, and report issues via feedback.
|
||||
|
||||
If this journey breaks, the product is broken.
|
||||
|
||||
## 4. Inviolable Principles
|
||||
|
||||
These are the boundary conditions that keep BinectChrome *BinectChrome*. They are constraints, not preferences.
|
||||
|
||||
| Principle | Meaning | Consequence |
|
||||
|-----------|---------|-------------|
|
||||
| **Explicit user intent** | Nothing is ever sent, ordered, or deleted without a deliberate user click. | No automatic or background dispatch, ever. Sending physical mail costs money and is irreversible. |
|
||||
| **Zero document retention** | The extension never stores or inspects PDF content. | PDF bytes exist in memory only during an active transfer, then are gone. Only technical metadata is tracked. |
|
||||
| **Local-only data** | Credentials and tracking history live in the browser and nowhere else. | Tracking data leaves the device only when the user explicitly sends a feedback email. No telemetry. |
|
||||
| **No backend relay** | The extension talks directly to the Binect API. | There is no BinectChrome server, and no server-side state tied to an installation. |
|
||||
| **Credentials at rest, encrypted and expiring** | Username/password are AES-GCM encrypted at rest, decrypted only in memory, auto-expire after 60 days of inactivity, and are manually wipeable. | "Use" resets the clock; abandonment erases the secret. |
|
||||
| **Least privilege** | Request only the permissions the journey actually needs, and justify each. | A permission that doesn't serve §3 doesn't belong in the manifest — it is review-cost the user pays for. |
|
||||
| **Clean removal** | Uninstalling leaves nothing behind. | All state is local, so removal is complete by construction. |
|
||||
| **Delegate the API, don't reinvent it** | Binect integration goes through the [`@binect/js`](../binect-js) library. | API-shape changes belong upstream; this repo stays a thin wrapper. |
|
||||
|
||||
## 5. Explicitly Out of Scope
|
||||
|
||||
BinectChrome is deliberately **not**:
|
||||
- A document store, viewer, editor, or content analyzer.
|
||||
- A backend relay or any server-side component.
|
||||
- An automation / RPA tool that drives third-party sites or sends without a click.
|
||||
- A credential-federation or shared-identity layer.
|
||||
- A cross-browser product in v1 — Chrome (Chromium-based, Manifest V3) only.
|
||||
- A telemetry or analytics collector.
|
||||
|
||||
When a feature request can only be satisfied by crossing one of these lines, the correct answer is to decline and document why.
|
||||
|
||||
## 6. Success Looks Like
|
||||
|
||||
- A user sends a freshly generated PDF to physical mail **without leaving the browser or touching a second app**.
|
||||
- A transfer's progress and outcome are always legible: Uploading, Success, or an actionable Failure.
|
||||
- A user can see exactly what was sent, where it came from, and what it cost — entirely from local history.
|
||||
- Credentials are protected at rest and disappear on their own when unused.
|
||||
- The extension passes Chrome Web Store review on a minimal permission set.
|
||||
- No privacy or security incident is ever traceable to the extension holding data it shouldn't.
|
||||
|
||||
## 7. How to Use This Document
|
||||
|
||||
- **Before adding a feature:** confirm it serves §1 and §3 and violates none of §4. If it requires the extension to retain documents, send without intent, or stand up a backend, it does not belong here.
|
||||
- **When the Binect API evolves:** adapt through `@binect/js`; preserve the intent. Product intent (this file) stays stable even as API details change.
|
||||
- **When in doubt:** any decision must be explainable as a direct consequence of the Core Intent in §1.
|
||||
|
||||
## 8. Related Documents
|
||||
|
||||
- [`SCOPE.md`](SCOPE.md) — the concrete, current operational boundary of what is delivered
|
||||
- [`specs/ProductRequirementsDocument.md`](specs/ProductRequirementsDocument.md) — the full PRD this intent distills
|
||||
- [`architecture/ADR-001-credential-encryption.md`](architecture/ADR-001-credential-encryption.md) — the credential-encryption decision
|
||||
- [`CLAUDE.md`](CLAUDE.md) — architecture and operating instructions for contributors
|
||||
- [`README.md`](README.md) — usage and developer setup
|
||||
112
SCOPE.md
Normal file
112
SCOPE.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# SCOPE.md
|
||||
|
||||
> This document defines **what BinectChrome does and does not cover**, concretely and as currently built. Where [`INTENT.md`](INTENT.md) records *why* the project exists and the principles it must uphold, this file draws the **operational boundary**: which capabilities are implemented, which are explicitly excluded, and where the edges are. A feature request is answered first by checking it against this document. It reflects the state of the code at the 2026-06 lifecycle reconciliation.
|
||||
|
||||
## 1. Scope Statement
|
||||
|
||||
BinectChrome covers **detecting a PDF in Chrome and sending it to Binect for physical mail, then tracking that document through its Binect lifecycle** — entirely from the browser, with no backend and no stored document content. It covers the detection, send, order, status, and local-tracking surface needed for that journey, and a thin credential layer to authenticate. It covers nothing on the server side beyond calling the Binect API, and nothing about creating or editing the documents themselves.
|
||||
|
||||
## 2. In Scope (Implemented)
|
||||
|
||||
### 2.1 PDF Detection (`src/utils/pdf-detector.ts`)
|
||||
|
||||
| Capability | Status | Notes |
|
||||
|------------|--------|-------|
|
||||
| Detect completed PDF downloads via Chrome Downloads API | ✅ | By `.pdf` extension or `application/pdf` MIME. |
|
||||
| Scan recent downloads on popup open | ✅ | `getLastPDFDownload` / popup `checkRecentDownloads`. |
|
||||
| Detect PDF in the current tab | ✅ (best effort) | `checkCurrentTabForPDF` in popup. |
|
||||
| Re-fetch PDF bytes from original URL using the user session | ✅ | `fetchPDFBytes`, `credentials: 'include'`. |
|
||||
| Blob-URL / complex-JS-viewer PDFs | ❌ (accepted limitation) | Not reliably detectable or retrievable — by design. |
|
||||
|
||||
### 2.2 Document Proxy Queue & Lifecycle (`src/utils/pdf-queue.ts`)
|
||||
|
||||
- **Proxies**: metadata-only records of detected/sent PDFs (`DocumentProxy`); **never contain PDF content**.
|
||||
- **Deduplication** by filename + content hash (`src/utils/hash.ts`).
|
||||
- **Lifecycle states**: `pending → uploading → in_basket → ordering → in_production → sent`, plus `failed` and `canceled`, mirroring the Binect server status.
|
||||
- **Live vs. archived** views; archived proxies age out after ~30 days; queue capped at ~100 entries.
|
||||
- **Server sync / reconciliation**: `syncFromServer`, `attachServerDocument`, `clearServerFields` — adopt server-discovered documents, update statuses, and detach proxies for documents deleted upstream.
|
||||
|
||||
### 2.3 Binect API Operations (`src/utils/binect-api.ts`, via `@binect/js`)
|
||||
|
||||
All Binect access is delegated to the [`@binect/js`](../binect-js) SDK; this module is a thin wrapper with extension-friendly types and error mapping.
|
||||
|
||||
| Operation | Status |
|
||||
|-----------|--------|
|
||||
| `uploadPDF` — base64 upload, places document in basket | ✅ |
|
||||
| `shipDocument` — place the print/delivery order | ✅ |
|
||||
| `getDocumentStatus` — per-document status refresh | ✅ |
|
||||
| `listServerDocuments` — list documents Binect holds (for sync) | ✅ |
|
||||
| `deleteDocument` — remove a document server-side | ✅ |
|
||||
| `testConnection` — credential validation | ✅ |
|
||||
| Structured errors (`BinectAPIError`, auth/size/4xx mapping) | ✅ |
|
||||
|
||||
### 2.4 Authentication & Credentials (`src/utils/crypto.ts`, `src/utils/storage.ts`)
|
||||
|
||||
- Username + password (HTTP Basic, per the Binect API).
|
||||
- **AES-GCM (256-bit)** encryption at rest via the Web Crypto API; key stored in `chrome.storage.local`; decrypted only in memory.
|
||||
- **60-day inactivity expiry**: `lastUse` timestamp refreshed on use; expired credentials auto-deleted on next load and by a daily `chrome.alarms` check.
|
||||
- Manual wipe (logout) always available; corrupted ciphertext is self-deleting.
|
||||
|
||||
### 2.5 User Interface (`src/popup/`, `src/tracking/`)
|
||||
|
||||
- **Popup**: login view, document list grouped by lifecycle stage (pending / erroneous / basket / production / completed / archived), send / order / refresh / archive / restore / delete actions, password-visibility toggle, badge, first-run pin reminder, auto-refresh.
|
||||
- **Toolbar badge**: actionable count, or a `•` idle indicator (Binect blue).
|
||||
- **Tracking / Help page** (`src/tracking/`): summary counts, chronological transfer list, accessible via the popup "?" link.
|
||||
|
||||
### 2.6 Local Tracking & Feedback (`src/tracking/tracker.ts`)
|
||||
|
||||
- Append-only transfer log (`TrackingEntry`): timestamp, source, destination, filesize, result/error class; **local-only**, capped at ~500 events.
|
||||
- `getTrackingSummary` for counts; `exportAsCSV` for export.
|
||||
- Feedback opens an email draft to `bernd.worsch@binect.de`; tracking data exportable as CSV (body / clipboard).
|
||||
|
||||
### 2.7 Service Worker & Platform (`src/background/service-worker.ts`, `public/manifest.json`)
|
||||
|
||||
- Manifest V3 service worker; message router for all popup ↔ background calls.
|
||||
- `chrome.alarms` for credential-expiry and queue-cleanup ticks (survives worker suspension).
|
||||
- Permissions requested: `downloads`, `storage`, `alarms`, `activeTab`; host access to `https://api.binect.de/*` and `<all_urls>`.
|
||||
|
||||
### 2.8 Supporting Material
|
||||
|
||||
- Tests (`tests/`): Jest unit tests for crypto, pdf-detector, binect-api, tracker (`@binect/js` mocked).
|
||||
- Build: TypeScript + Webpack (`npm run build` → `dist/`); ESLint; `tsc` type-check.
|
||||
- Docs: [`CLAUDE.md`](CLAUDE.md), [`README.md`](README.md), `DEVELOPMENT.md`, `architecture/ADR-001-credential-encryption.md`, detection/testing guides.
|
||||
|
||||
## 3. Out of Scope
|
||||
|
||||
Excluded by design. A request requiring any of these is a request for a *different* product (see [`INTENT.md` §5](INTENT.md)).
|
||||
|
||||
| Excluded | Reason |
|
||||
|----------|--------|
|
||||
| Storing, viewing, editing, or inspecting PDF content | Zero-retention principle; proxies hold metadata only. |
|
||||
| Any server-side / backend component | The extension talks directly to Binect; no relay, no install-tied state. |
|
||||
| Automatic or background sending, ordering, or deleting | Every dispatch action requires explicit user intent; mail costs money and is irreversible. |
|
||||
| PDF generation, layout, or transformation | The user brings a finished PDF; document prep is the source app's job. |
|
||||
| Reinterpreting or extending the Binect API | API behavior is delegated 1:1 to `@binect/js`; new coverage belongs upstream. |
|
||||
| Cross-browser support (Firefox, Edge, …) | Chrome / Chromium MV3 only in v1. |
|
||||
| Credential federation, SSO, token auth | Username + password only, until the API evolves. |
|
||||
| Telemetry / analytics / remote logging | Tracking is local; data leaves only via explicit user feedback email. |
|
||||
| Multi-profile destinations, rule-based automation, org policies | Listed as future considerations in the PRD §10, not built. |
|
||||
|
||||
## 4. Boundary Cases & Known Edges
|
||||
|
||||
- **Uploaded ≠ sent.** Uploading places a document in the Binect *basket* (shippable). Physical dispatch is a separate, explicitly confirmed "order" step. Conflating the two would violate the explicit-intent principle.
|
||||
- **`<all_urls>` host permission.** Required so the extension can re-fetch PDFs from arbitrary source domains using the user's session. It is broader than the "minimal permissions" the PRD/INTENT aspire to and is a known **Chrome Web Store review cost** — it should be justified in the store listing or narrowed, not silently expanded.
|
||||
- **Content hash is a fast non-cryptographic digest** (`computeContentFingerprint`, `src/utils/hash.ts`) — a sampled rolling fingerprint used only for deduplication, never for security.
|
||||
- **Two capped stores, not one.** Lifecycle *proxies* (~100, ~30-day archive aging) are distinct from the transfer *log* (~500 events). See PRD §4.6.3.
|
||||
- **`@binect/js` is a local file dependency** (`file:../binect-js`). The sibling repo must be present to build; it is not yet published to a registry.
|
||||
- **Service-worker lifecycle.** All state persists in `chrome.storage.local`; nothing relies on in-memory background state surviving suspension.
|
||||
|
||||
## 5. Scope Change Process
|
||||
|
||||
1. Confirm the change violates no principle in [`INTENT.md` §4](INTENT.md) — especially zero-retention, explicit intent, no backend.
|
||||
2. If it adds Binect API coverage, add it upstream in `@binect/js` and surface it through the thin wrapper — do not reimplement API logic here.
|
||||
3. If it expands permissions, document the justification (review impact) before adding.
|
||||
4. Update this file and, where the boundary genuinely moves, [`INTENT.md`](INTENT.md) and the [PRD](specs/ProductRequirementsDocument.md) together.
|
||||
|
||||
## 6. Related Documents
|
||||
|
||||
- [`INTENT.md`](INTENT.md) — why the project exists and its inviolable principles
|
||||
- [`specs/ProductRequirementsDocument.md`](specs/ProductRequirementsDocument.md) — full PRD (reconciled with the lifecycle scope)
|
||||
- [`architecture/ADR-001-credential-encryption.md`](architecture/ADR-001-credential-encryption.md) — credential encryption decision
|
||||
- [`CLAUDE.md`](CLAUDE.md) — architecture and contributor instructions
|
||||
- [`README.md`](README.md) — usage and developer setup
|
||||
@@ -8,6 +8,22 @@ BinectChromePrd
|
||||
|
||||
---
|
||||
|
||||
> **Revision note — scope evolution (2026-06).**
|
||||
> This PRD originally specified a minimal *detect → send* tool. The shipped
|
||||
> implementation has grown into a **document-lifecycle** assistant: it not only
|
||||
> uploads PDFs but tracks each one through its Binect server-side states
|
||||
> (in-basket → ordered → in production → sent), reconciles local records against
|
||||
> what Binect actually holds, and lets the user order and manage documents from
|
||||
> the popup. Sections **4.3a (Document Lifecycle & Ordering)** and **4.6
|
||||
> (Local Tracking)** below have been reconciled with this reality. The growth is
|
||||
> in-scope **only** because it never violates the inviolable principles in
|
||||
> [`INTENT.md`](../INTENT.md) §4 — in particular, the lifecycle is represented by
|
||||
> *metadata proxies that never hold PDF content*, and every server action (upload,
|
||||
> order, delete) remains user-initiated. The companion [`SCOPE.md`](../SCOPE.md)
|
||||
> records exactly what is implemented today.
|
||||
|
||||
---
|
||||
|
||||
## 1. Product Overview
|
||||
|
||||
### 1.1 Purpose
|
||||
@@ -39,8 +55,10 @@ BinectChrome:
|
||||
* Detects PDF downloads (and supported in-browser PDF views)
|
||||
* Offers a **“Send PDF to Binect”** action
|
||||
* Securely transfers the PDF to Binect via its API
|
||||
* Requires explicit user intent
|
||||
* Stores no PDF content
|
||||
* Tracks each sent document through its Binect lifecycle (in-basket → ordered → in production → sent) and lets the user place the print/delivery **order** with explicit confirmation
|
||||
* Reconciles local records against the documents Binect actually holds (server sync)
|
||||
* Requires explicit user intent for every send, order, and delete
|
||||
* Stores no PDF content — only lightweight metadata proxies
|
||||
* Tracks transfers locally for transparency and support
|
||||
|
||||
---
|
||||
@@ -133,6 +151,54 @@ PDFs rendered via blob URLs or complex JavaScript viewers may not be detectable
|
||||
|
||||
---
|
||||
|
||||
### 4.3a Document Lifecycle & Ordering
|
||||
|
||||
*(Added in the 2026-06 reconciliation. Distinguishes "uploaded to Binect" from
|
||||
"actually sent as physical mail," which the original PRD conflated.)*
|
||||
|
||||
#### 4.3a.1 Document Proxies (MUST)
|
||||
|
||||
* Each detected/sent PDF is represented locally by a **proxy**: a metadata-only
|
||||
record (filename, size, source, content hash, Binect document ID, status). The
|
||||
proxy **never contains PDF content**.
|
||||
* Proxies are deduplicated by filename + content hash so the same document is not
|
||||
tracked twice.
|
||||
|
||||
#### 4.3a.2 Lifecycle States (MUST)
|
||||
|
||||
* A proxy carries a status mirroring the Binect server lifecycle:
|
||||
`pending` → `uploading` → `in_basket` (uploaded, shippable) → `ordering` →
|
||||
`in_production` → `sent`, plus the off-path states `failed` and `canceled`.
|
||||
* The popup groups documents by lifecycle stage and shows the current status,
|
||||
price (when known), and recipient where available.
|
||||
|
||||
#### 4.3a.3 Ordering / Dispatch (MUST)
|
||||
|
||||
* Uploading a PDF places it in the Binect **basket** (shippable) but does **not**
|
||||
send physical mail.
|
||||
* Physically sending requires a **separate, explicit user action** ("order") with
|
||||
clear confirmation, because dispatch costs money and is irreversible.
|
||||
|
||||
#### 4.3a.4 Erroneous Documents (SHOULD)
|
||||
|
||||
* If Binect reports a document as erroneous, the extension surfaces the error and
|
||||
offers to refresh its status (the server may resolve it) or delete it.
|
||||
|
||||
#### 4.3a.5 Status Refresh & Server Sync (SHOULD)
|
||||
|
||||
* The user can refresh a document's status on demand.
|
||||
* The extension can **sync from the server**: list the documents Binect actually
|
||||
holds and reconcile them with local proxies — adopting server-discovered
|
||||
documents, updating statuses, and clearing server fields for documents deleted
|
||||
upstream.
|
||||
|
||||
#### 4.3a.6 Server-Side Deletion (MUST for delete actions)
|
||||
|
||||
* Deleting a document from Binect requires explicit user action; on success the
|
||||
local proxy is archived rather than silently dropped.
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Authentication & Credential Handling
|
||||
|
||||
#### 4.4.1 Authentication Method (MUST)
|
||||
@@ -198,8 +264,18 @@ Tracking data stored **locally only**:
|
||||
|
||||
#### 4.6.3 Retention (SHOULD)
|
||||
|
||||
* Cap number of entries (e.g. last 500 events)
|
||||
* Prevent unbounded growth
|
||||
* Cap number of entries to prevent unbounded growth.
|
||||
* **Two distinct stores exist** after the lifecycle reconciliation, each capped
|
||||
independently:
|
||||
|
||||
* **Document proxy queue** (active lifecycle records): live vs. archived views;
|
||||
archived proxies age out after ~30 days; capped at ~100 entries.
|
||||
* **Tracking log** ("Score", append-only transfer events for transparency/CSV
|
||||
export): capped at the last ~500 events.
|
||||
|
||||
*(The original PRD named a single "≤ 500 events" cap. The implementation
|
||||
splits short-lived lifecycle proxies from the long-lived transfer log; the
|
||||
numbers above reflect the shipped behavior and may be tuned.)*
|
||||
|
||||
---
|
||||
|
||||
@@ -304,8 +380,11 @@ Expected permissions include:
|
||||
|
||||
---
|
||||
|
||||
**BinectChrome** is intentionally modest in scope:
|
||||
a focused, trustworthy bridge between modern cloud software and physical mail — implemented where the user already works: the browser.
|
||||
|
||||
|
||||
xxx
|
||||
**BinectChrome** is intentionally focused in scope:
|
||||
a trustworthy bridge between modern cloud software and physical mail —
|
||||
implemented where the user already works: the browser. It has grown from a pure
|
||||
*detect → send* tool into one that also follows each document through its Binect
|
||||
lifecycle, but it has not crossed its founding boundaries: no stored documents,
|
||||
no backend, no automatic dispatch. Those boundaries are recorded as inviolable
|
||||
principles in [`INTENT.md`](../INTENT.md), and the concrete delivered surface in
|
||||
[`SCOPE.md`](../SCOPE.md).
|
||||
|
||||
@@ -8,7 +8,7 @@ import { uploadPDF, testConnection, BinectAPIError, Document } from '../utils/bi
|
||||
import { fetchPDFBytes, DetectedPDF } from '../utils/pdf-detector';
|
||||
import { addTrackingEntry } from '../tracking/tracker';
|
||||
import { DocumentProxy, PDFQueueEntry, PDFStatus, PDFStatusMeta } from '../utils/pdf-queue';
|
||||
import { computeMD5 } from '../utils/hash';
|
||||
import { computeContentFingerprint } from '../utils/hash';
|
||||
|
||||
// DOM Elements
|
||||
const authView = document.getElementById('authView')!;
|
||||
@@ -827,8 +827,8 @@ async function handleSendPDF(id: string) {
|
||||
// Fetch PDF bytes
|
||||
const pdfBytes = await fetchPDFBytes(pdf.url);
|
||||
|
||||
// Compute content hash for deduplication
|
||||
const contentHash = await computeMD5(pdfBytes);
|
||||
// Compute content fingerprint for deduplication
|
||||
const contentHash = await computeContentFingerprint(pdfBytes);
|
||||
|
||||
// Upload to Binect with credentials
|
||||
const document = await uploadPDF(
|
||||
|
||||
@@ -3,17 +3,16 @@
|
||||
*/
|
||||
|
||||
/**
|
||||
* Compute MD5 hash of an ArrayBuffer using Web Crypto API
|
||||
* Falls back to a simple hash if crypto.subtle is unavailable
|
||||
* Compute a fast, non-cryptographic content fingerprint for an ArrayBuffer.
|
||||
*
|
||||
* This is NOT a cryptographic hash (not MD5/SHA): it samples the bytes and
|
||||
* combines them with the file size. It is used only for deduplicating detected
|
||||
* PDFs, never for security. Returns a `${sizeHex}-${hashHex}` fingerprint.
|
||||
*/
|
||||
export async function computeMD5(data: ArrayBuffer): Promise<string> {
|
||||
// Web Crypto API doesn't support MD5 (it's not cryptographically secure)
|
||||
// We'll use a simple but fast hash for content identification
|
||||
// This is fine for deduplication purposes
|
||||
export async function computeContentFingerprint(data: ArrayBuffer): Promise<string> {
|
||||
const bytes = new Uint8Array(data);
|
||||
|
||||
// Use a combination of length and sampled bytes for fast hashing
|
||||
// For true MD5, we'd need a library, but this is sufficient for deduplication
|
||||
// Sample bytes (not the full buffer) and fold them together for speed.
|
||||
let hash = 0;
|
||||
const sampleSize = Math.min(bytes.length, 10000); // Sample first 10KB
|
||||
const step = Math.max(1, Math.floor(bytes.length / sampleSize));
|
||||
|
||||
Reference in New Issue
Block a user