feat: start mailbox evidence scanner

2026-06-02 01:19:09 +02:00
parent 8292ffe41d
commit 8532583182
26 changed files with 1733 additions and 18 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -171,6 +171,9 @@ cython_debug/
 # Ruff stuff:
 .ruff_cache/

+# email-connect local runtime artifacts
+.email-connect/
+reports/*.csv
+
 # PyPI configuration file
 .pypirc
-
--- a/README.md
+++ b/README.md
@@ -1,3 +1,34 @@
-# repo-seed
+# email-connect

-A git repository template to bootstrap coulomb projects from.
+Headless, provider-neutral email communication and evidence service.
+
+The first implementation slice is the Mailbox Evidence Scanner MVP: scan a
+return mailbox or fixture directory, classify inbound email-channel evidence,
+store scan state locally, and generate timestamped CSV reports without
+overclaiming delivery, awareness, or coordination success.
+
+## Quickstart
+
+```bash
+PYTHONPATH=src python3 -m unittest discover -s tests
+PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
+PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
+```
+
+The example config uses `tests/fixtures/mailbox` as a mailbox source. Runtime
+state is written to `.email-connect/state.sqlite`; generated CSV reports are
+written to `reports/`.
+
+## Current Scope
+
+- Coordination-engine spec review and references.
+- Initial adapter descriptor, capability profile, evidence ceiling, and
+  limitations.
+- Conservative mailbox message parser and evidence mapper.
+- SQLite state store with message and evidence deduplication.
+- CSV report generation.
+- Golden fixture tests for hard bounce, soft bounce, out-of-office, and human
+  reply signals.
+
+IMAP access, provider webhooks, outbound sending, suppression workflows, and a
+UI are planned work, not complete in this first slice.
--- a/config/mailbox.example.yml
+++ b/config/mailbox.example.yml
@@ -0,0 +1,31 @@
+mailbox:
+  id: return-mailbox-default
+  protocol: fixture
+  host: null
+  port: 993
+  tls: true
+  username_env: EMAIL_CONNECT_IMAP_USER
+  password_env: EMAIL_CONNECT_IMAP_PASSWORD
+  folder: INBOX
+
+source:
+  fixture_dir: tests/fixtures/mailbox
+
+scan:
+  mode: incremental
+  max_messages_per_run: 5000
+  since: null
+  include_seen: true
+  mark_seen: false
+  store_raw_headers: true
+  store_raw_body: false
+  store_raw_message_ref: true
+
+storage:
+  path: .email-connect/state.sqlite
+
+reports:
+  output_dir: reports
+  include_all_evidence: true
+  include_unknown_messages: true
+  timestamp_timezone: UTC
--- a/docs/coordination-engine-email-spec-review.md
+++ b/docs/coordination-engine-email-spec-review.md
@@ -0,0 +1,46 @@
+# coordination-engine Email Spec Review
+
+## Source Specs
+
+Reviewed local coordination-engine specifications:
+
+- `/home/worsch/coordination-engine/spec/EmailAdapterSpecification.md`
+- `/home/worsch/coordination-engine/spec/AdapterInterfaceSpecification.md`
+- `/home/worsch/coordination-engine/spec/RuntimeArchitectureAndAdapterSubsystem.md`
+- `/home/worsch/coordination-engine/spec/ProductRequirementsDocument.md`
+
+These are referenced, not copied, so `email-connect` stays aligned with the
+authoritative coordination-engine checkout.
+
+## Contract Points For email-connect
+
+- `email-connect` is a `notification`, `communication`, and `interaction`
+  adapter.
+- It reports email-channel evidence and advisory assessment. It does not own
+  coordination case result evaluation.
+- Ambiguous provider or mailbox signals must map to the weakest safe normalized
+  event.
+- Provider and mailbox events must preserve raw references and native status
+  mappings.
+- The adapter must expose or enable production of normalized `EvidenceEvent`
+  records.
+- The adapter should expose endpoint quality diagnostics, but endpoint quality
+  is not coordination success.
+- Email's positive evidence ceiling is limited: email tracking cannot prove
+  human awareness, identity, authority, payload access, or non-repudiation.
+- Golden tests must include overclaim prevention, especially "provider
+  delivered" not becoming awareness or payload delivery.
+
+## First Implementation Implications
+
+- The mailbox scanner emits `EmailEvidenceCandidate` rows that are shaped to
+  become coordination-engine `EvidenceEvent` records later.
+- Classifiers preserve `unknown_return_message` and `parse_failed` instead of
+  silently discarding uncertainty.
+- Out-of-office replies update diagnostics only; they do not prove awareness or
+  reachability.
+- Human replies are email-channel success signals, but not legal acceptance or
+  coordination result satisfaction.
+- CSV reports include event type, assessment category/subclass, confidence,
+  evidence strength, observed time, occurred time, raw reference, and
+  deduplication key for auditability.
--- a/docs/email-evidence-canon.md
+++ b/docs/email-evidence-canon.md
@@ -0,0 +1,48 @@
+# Email Evidence Canon
+
+## Rule
+
+Email events are evidence, not result satisfaction.
+
+The scanner reports email-channel facts and uncertainty. A downstream
+coordination runtime decides whether those facts satisfy a coordination case.
+
+## Initial Message Classes
+
+- `hard_bounce`
+- `soft_bounce`
+- `delayed_delivery_notice`
+- `final_delivery_failure`
+- `out_of_office`
+- `human_reply`
+- `complaint_or_abuse`
+- `unsubscribe_or_opt_out`
+- `challenge_response`
+- `unknown_return_message`
+- `unrelated_message`
+- `parse_failed`
+
+## Initial Normalized Events
+
+| Message class | Normalized event | Assessment |
+| --- | --- | --- |
+| `hard_bounce` | `notification.endpoint.rejected_permanent` | `fail.hard_bounce` |
+| `soft_bounce` | `notification.endpoint.rejected_temporary` | `undef.deferred` |
+| `delayed_delivery_notice` | `notification.endpoint.deferred` | `undef.deferred` |
+| `final_delivery_failure` | `notification.endpoint.rejected_permanent` | `fail.expired_without_delivery` |
+| `out_of_office` | `interaction.out_of_office_received` | `undef.out_of_office` |
+| `human_reply` | `interaction.reply_received` | `success.reply_received` |
+| `complaint_or_abuse` | `notification.channel.complaint_received` | `fail.complaint_received` |
+| `unsubscribe_or_opt_out` | `notification.channel.unsubscribe_received` | `fail.unsubscribed` |
+| `unknown_return_message` | `notification.endpoint.unknown` | `undef.conflicting_evidence` |
+| `challenge_response` | `interaction.unverified_actor_interaction` | `undef.identity_uncertain` |
+
+## Overclaim Prevention
+
+- No bounce found does not mean delivery success.
+- Provider acceptance does not mean endpoint acceptance.
+- Endpoint acceptance does not mean inbox placement.
+- Out-of-office does not prove recipient awareness or action.
+- Human reply does not prove legal acceptance.
+- Unknown return messages remain visible.
+- Scanner and proxy interactions must stay below identity-bound interaction.
--- a/docs/initial-runtime-architecture.md
+++ b/docs/initial-runtime-architecture.md
@@ -0,0 +1,98 @@
+# Initial Runtime Architecture
+
+## Status
+
+This is the first implementation architecture for the mailbox evidence scanner
+slice. It is intentionally small and stdlib-only so the repo can run before a
+larger service stack is chosen.
+
+## Service Boundary
+
+The first slice is a CLI scanner:
+
+```text
+email-connect scan-mailbox --config config/mailbox.example.yml --out reports/
+```
+
+It scans an inbound return mailbox source, classifies messages, stores scan
+state in SQLite, and writes timestamped CSV evidence reports.
+
+The initial source implementation supports fixture directories. The IMAP
+connector remains the next mailbox boundary to complete under
+`EMAIL-WP-0002-T02`.
+
+## Package Layout
+
+```text
+src/email_connect/
+  adapter_contract.py  # coordination-engine descriptor and evidence ceiling
+  cli.py               # command line entry points
+  config.py            # config loading
+  evidence.py          # native class to normalized evidence mapping
+  models.py            # mailbox, parse, evidence, endpoint quality dataclasses
+  parser.py            # MIME/header parsing and conservative classification
+  reporting.py         # CSV report generation
+  scanner.py           # scan orchestration
+  storage.py           # SQLite state store
+```
+
+## Persistence
+
+SQLite is the MVP store. The initial schema includes:
+
+- `mailbox_scans`
+- `mailbox_messages`
+- `parsed_messages`
+- `evidence_candidates`
+
+Message deduplication is keyed by mailbox ID, IMAP UID when present, message ID,
+received timestamp, sender, subject hash, and body hash. Evidence
+deduplication follows the workplan fields: message, parser version, normalized
+event, affected recipient, original message, SMTP/enhanced status, and reason.
+
+## Evidence Mapping
+
+Parser output is represented as `ParsedMailboxMessage`. The mapper converts it
+to `EmailEvidenceCandidate` using coordination-engine event names and advisory
+assessment classes.
+
+Examples:
+
+- `hard_bounce` -> `notification.endpoint.rejected_permanent`
+- `soft_bounce` -> `notification.endpoint.rejected_temporary`
+- `out_of_office` -> `interaction.out_of_office_received`
+- `human_reply` -> `interaction.reply_received`
+
+The mapper does not emit evidence for unrelated messages. Unknown return
+messages stay visible as `notification.endpoint.unknown`.
+
+## coordination-engine Alignment
+
+The implementation keeps these coordination-engine concepts explicit:
+
+- adapter descriptor
+- adapter capability profile
+- evidence ceiling
+- advisory assessment
+- endpoint quality update shape
+- event observation and raw reference preservation
+- golden tests for overclaim prevention
+
+Email evidence remains below the coordination result layer. The scanner does
+not infer inbox placement, human awareness, legal acceptance, payload access, or
+case success.
+
+## Provider Boundary
+
+Provider webhook ingestion and outbound send APIs are deliberately outside this
+slice. The mailbox scanner uses the same evidence model so future provider
+events can enter through a parallel ingestion path and converge at the same
+normalization layer.
+
+## Development Commands
+
+```bash
+PYTHONPATH=src python3 -m unittest discover -s tests
+PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
+PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
+```
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,19 @@
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "email-connect"
+version = "0.1.0"
+description = "Provider-neutral email communication and evidence service."
+readme = "README.md"
+requires-python = ">=3.11"
+license = { file = "LICENSE" }
+authors = [{ name = "Coulomb" }]
+dependencies = []
+
+[project.scripts]
+email-connect = "email_connect.cli:main"
+
+[tool.setuptools.packages.find]
+where = ["src"]
--- a/reports/.gitkeep
+++ b/reports/.gitkeep
@@ -0,0 +1 @@
+
--- a/src/email_connect/init.py
+++ b/src/email_connect/init.py
@@ -0,0 +1,3 @@
+"""email-connect package."""
+
+__version__ = "0.1.0"
--- a/src/email_connect/adapter_contract.py
+++ b/src/email_connect/adapter_contract.py
@@ -0,0 +1,96 @@
+from __future__ import annotations
+
+ADAPTER_ID = "email-connect"
+ADAPTER_CONTRACT_VERSION = "1.1"
+
+EMITTED_EVENT_TYPES = [
+    "notification.endpoint.rejected_permanent",
+    "notification.endpoint.rejected_temporary",
+    "notification.endpoint.deferred",
+    "notification.channel.complaint_received",
+    "notification.channel.unsubscribe_received",
+    "interaction.reply_received",
+    "interaction.out_of_office_received",
+    "notification.endpoint.unknown",
+]
+
+
+def adapter_descriptor() -> dict:
+    """Return the initial coordination-engine adapter descriptor."""
+
+    return {
+        "adapter_id": ADAPTER_ID,
+        "adapter_name": "email-connect",
+        "adapter_version": "0.1.0",
+        "adapter_contract_version": ADAPTER_CONTRACT_VERSION,
+        "adapter_types": ["notification", "communication", "interaction"],
+        "deployment_mode": "external",
+        "capability_profile": {
+            "can_notify": True,
+            "can_publish": False,
+            "can_deliver_payload": False,
+            "can_collect_payload": False,
+            "can_grant_access": False,
+            "can_revoke_access": False,
+            "can_observe_interaction": True,
+            "can_observe_identity": False,
+            "can_observe_authority": False,
+            "can_emit_delivery_receipts": False,
+            "can_emit_display_or_read_receipts": False,
+            "can_emit_return_or_failure_events": True,
+            "can_emit_late_events": True,
+            "can_cancel_after_dispatch": False,
+            "supports_webhooks": False,
+            "supports_polling": True,
+            "supports_idempotency": "adapter_managed",
+            "supports_endpoint_quality": True,
+            "supports_native_status_mapping": True,
+            "supports_golden_tests": True,
+            "limitations": [
+                "Mailbox evidence cannot prove inbox placement.",
+                "Reply, open, and click signals do not prove coordination result satisfaction.",
+            ],
+        },
+        "supported_actions": [],
+        "emitted_event_types": EMITTED_EVENT_TYPES,
+        "supported_channels": ["email"],
+        "supported_endpoint_types": ["email_address"],
+        "evidence_profile": {
+            "native_status_mapping_ref": "src/email_connect/evidence.py",
+            "golden_tests_ref": "tests/fixtures",
+        },
+        "evidence_ceiling": {
+            "max_positive_event": "interaction.reply_received",
+            "max_positive_strength": "medium",
+            "can_prove_human_awareness": False,
+            "can_prove_payload_access": False,
+            "can_prove_payload_delivery": False,
+            "can_prove_identity": False,
+            "can_prove_authority": False,
+            "can_prove_non_repudiation": False,
+            "limitations": [
+                "Email cannot reliably prove intended-recipient awareness.",
+                "Mailbox scanning observes return-channel evidence only.",
+            ],
+        },
+        "assurance_capability": {
+            "awareness_assurance": "weak",
+            "delivery_assurance": "none",
+            "identity_assurance": "none",
+            "authority_assurance": "none",
+            "non_repudiation_assurance": "none",
+        },
+        "identity_profile": {
+            "can_identify_actor": False,
+            "identity_limitations": ["Inbound mailbox evidence is not authenticated actor evidence."],
+        },
+        "late_event_policy": {
+            "accepts_late_events": True,
+            "late_event_notes": ["Bounces and replies may arrive after earlier acceptance evidence."],
+        },
+        "limitations": [
+            "No outbound sending in the first mailbox-scanner MVP.",
+            "No provider webhook integration in the first mailbox-scanner MVP.",
+            "No legal delivery assessment.",
+        ],
+    }
--- a/src/email_connect/cli.py
+++ b/src/email_connect/cli.py
@@ -0,0 +1,55 @@
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+
+from .adapter_contract import adapter_descriptor
+from .config import load_config
+from .scanner import scan_mailbox
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(prog="email-connect")
+    sub = parser.add_subparsers(dest="command", required=True)
+
+    scan = sub.add_parser("scan-mailbox", help="Scan a return mailbox or fixture directory.")
+    scan.add_argument("--config", required=True)
+    scan.add_argument("--out", default=None)
+    scan.add_argument("--full-rescan", action="store_true")
+    scan.add_argument("--since", default=None)
+    scan.add_argument("--report-only-new", action="store_true")
+    scan.add_argument("--dry-run", action="store_true")
+    scan.add_argument("--fixture-dir", default=None)
+
+    sub.add_parser("adapter-descriptor", help="Print the initial coordination-engine adapter descriptor.")
+
+    args = parser.parse_args(argv)
+    if args.command == "adapter-descriptor":
+        print(json.dumps(adapter_descriptor(), indent=2, sort_keys=True))
+        return 0
+
+    if args.command == "scan-mailbox":
+        config = load_config(args.config)
+        result = scan_mailbox(
+            config,
+            output_dir=args.out,
+            full_rescan=args.full_rescan,
+            report_only_new=args.report_only_new,
+            dry_run=args.dry_run,
+            fixture_dir=args.fixture_dir,
+        )
+        print(f"scan_id={result.scan.scan_id}")
+        print(f"messages_seen={result.scan.messages_seen}")
+        print(f"messages_new={result.scan.messages_new}")
+        print(f"messages_parsed={result.scan.messages_parsed}")
+        print(f"evidence_events_created={result.scan.evidence_events_created}")
+        if result.report_path:
+            print(f"report_path={Path(result.report_path)}")
+        return 0
+
+    return 2
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/src/email_connect/config.py
+++ b/src/email_connect/config.py
@@ -0,0 +1,140 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any
+
+
+@dataclass(frozen=True)
+class MailboxConfig:
+    id: str
+    protocol: str = "imap"
+    host: str | None = None
+    port: int = 993
+    tls: bool = True
+    username_env: str | None = None
+    password_env: str | None = None
+    folder: str = "INBOX"
+
+
+@dataclass(frozen=True)
+class ScanConfig:
+    mode: str = "incremental"
+    max_messages_per_run: int = 5000
+    since: str | None = None
+    include_seen: bool = True
+    mark_seen: bool = False
+    store_raw_headers: bool = True
+    store_raw_body: bool = False
+    store_raw_message_ref: bool = True
+
+
+@dataclass(frozen=True)
+class StorageConfig:
+    path: str = ".email-connect/state.sqlite"
+
+
+@dataclass(frozen=True)
+class ReportsConfig:
+    output_dir: str = "reports"
+    include_all_evidence: bool = True
+    include_unknown_messages: bool = True
+    timestamp_timezone: str = "UTC"
+
+
+@dataclass(frozen=True)
+class SourceConfig:
+    fixture_dir: str | None = None
+
+
+@dataclass(frozen=True)
+class AppConfig:
+    mailbox: MailboxConfig
+    scan: ScanConfig
+    storage: StorageConfig
+    reports: ReportsConfig
+    source: SourceConfig = SourceConfig()
+
+
+def load_config(path: str | Path) -> AppConfig:
+    data = _load_mapping(Path(path))
+    mailbox = data.get("mailbox", {})
+    scan = data.get("scan", {})
+    storage = data.get("storage", {})
+    reports = data.get("reports", {})
+    source = data.get("source", {})
+    return AppConfig(
+        mailbox=MailboxConfig(
+            id=str(mailbox.get("id", "return-mailbox-default")),
+            protocol=str(mailbox.get("protocol", "imap")),
+            host=mailbox.get("host"),
+            port=int(mailbox.get("port", 993)),
+            tls=bool(mailbox.get("tls", True)),
+            username_env=mailbox.get("username_env"),
+            password_env=mailbox.get("password_env"),
+            folder=str(mailbox.get("folder", "INBOX")),
+        ),
+        scan=ScanConfig(
+            mode=str(scan.get("mode", "incremental")),
+            max_messages_per_run=int(scan.get("max_messages_per_run", 5000)),
+            since=scan.get("since"),
+            include_seen=bool(scan.get("include_seen", True)),
+            mark_seen=bool(scan.get("mark_seen", False)),
+            store_raw_headers=bool(scan.get("store_raw_headers", True)),
+            store_raw_body=bool(scan.get("store_raw_body", False)),
+            store_raw_message_ref=bool(scan.get("store_raw_message_ref", True)),
+        ),
+        storage=StorageConfig(path=str(storage.get("path", ".email-connect/state.sqlite"))),
+        reports=ReportsConfig(
+            output_dir=str(reports.get("output_dir", "reports")),
+            include_all_evidence=bool(reports.get("include_all_evidence", True)),
+            include_unknown_messages=bool(reports.get("include_unknown_messages", True)),
+            timestamp_timezone=str(reports.get("timestamp_timezone", "UTC")),
+        ),
+        source=SourceConfig(fixture_dir=source.get("fixture_dir")),
+    )
+
+
+def _load_mapping(path: Path) -> dict[str, Any]:
+    text = path.read_text(encoding="utf-8")
+    try:
+        import yaml  # type: ignore
+
+        loaded = yaml.safe_load(text)
+        return loaded or {}
+    except ModuleNotFoundError:
+        return _parse_simple_yaml(text)
+
+
+def _parse_simple_yaml(text: str) -> dict[str, Any]:
+    """Parse the small YAML subset used by config/mailbox.example.yml."""
+
+    result: dict[str, Any] = {}
+    current: dict[str, Any] | None = None
+    for raw_line in text.splitlines():
+        line = raw_line.split("#", 1)[0].rstrip()
+        if not line.strip():
+            continue
+        if not line.startswith(" ") and line.endswith(":"):
+            key = line[:-1].strip()
+            current = {}
+            result[key] = current
+            continue
+        if current is None or ":" not in line:
+            raise ValueError(f"Unsupported YAML line: {raw_line}")
+        key, value = line.strip().split(":", 1)
+        current[key.strip()] = _parse_scalar(value.strip())
+    return result
+
+
+def _parse_scalar(value: str) -> Any:
+    if value == "" or value == "null":
+        return None
+    if value in {"true", "false"}:
+        return value == "true"
+    if value.startswith('"') and value.endswith('"'):
+        return value[1:-1]
+    try:
+        return int(value)
+    except ValueError:
+        return value
--- a/src/email_connect/evidence.py
+++ b/src/email_connect/evidence.py
@@ -0,0 +1,151 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from datetime import datetime
+from uuid import uuid5, NAMESPACE_URL
+
+from .models import (
+    AssessmentCategory,
+    Confidence,
+    EmailEvidenceCandidate,
+    EvidenceStrength,
+    MessageClass,
+    ParsedMailboxMessage,
+)
+
+
+@dataclass(frozen=True)
+class EvidenceMapping:
+    event_type: str
+    assessment_category: AssessmentCategory
+    assessment_subclass: str
+    evidence_strength: EvidenceStrength
+    normalized_meaning: str
+
+
+EVIDENCE_MAPPINGS: dict[MessageClass, EvidenceMapping | None] = {
+    MessageClass.HARD_BOUNCE: EvidenceMapping(
+        "notification.endpoint.rejected_permanent",
+        AssessmentCategory.FAIL,
+        "fail.hard_bounce",
+        EvidenceStrength.NEGATIVE,
+        "Strong evidence that the recipient endpoint rejected the message permanently.",
+    ),
+    MessageClass.SOFT_BOUNCE: EvidenceMapping(
+        "notification.endpoint.rejected_temporary",
+        AssessmentCategory.UNDEF,
+        "undef.deferred",
+        EvidenceStrength.AMBIGUOUS,
+        "Temporary endpoint failure; retry or later evidence may change interpretation.",
+    ),
+    MessageClass.DELAYED_DELIVERY_NOTICE: EvidenceMapping(
+        "notification.endpoint.deferred",
+        AssessmentCategory.UNDEF,
+        "undef.deferred",
+        EvidenceStrength.AMBIGUOUS,
+        "Delivery is delayed and still uncertain.",
+    ),
+    MessageClass.FINAL_DELIVERY_FAILURE: EvidenceMapping(
+        "notification.endpoint.rejected_permanent",
+        AssessmentCategory.FAIL,
+        "fail.expired_without_delivery",
+        EvidenceStrength.NEGATIVE,
+        "Provider or endpoint gave up after retrying.",
+    ),
+    MessageClass.OUT_OF_OFFICE: EvidenceMapping(
+        "interaction.out_of_office_received",
+        AssessmentCategory.UNDEF,
+        "undef.out_of_office",
+        EvidenceStrength.WEAK,
+        "Mailbox auto-reply was observed; awareness and action remain uncertain.",
+    ),
+    MessageClass.HUMAN_REPLY: EvidenceMapping(
+        "interaction.reply_received",
+        AssessmentCategory.SUCCESS,
+        "success.reply_received",
+        EvidenceStrength.MEDIUM,
+        "A reply-like message was observed on the email channel.",
+    ),
+    MessageClass.COMPLAINT_OR_ABUSE: EvidenceMapping(
+        "notification.channel.complaint_received",
+        AssessmentCategory.FAIL,
+        "fail.complaint_received",
+        EvidenceStrength.NEGATIVE,
+        "Complaint or abuse feedback was observed.",
+    ),
+    MessageClass.UNSUBSCRIBE_OR_OPT_OUT: EvidenceMapping(
+        "notification.channel.unsubscribe_received",
+        AssessmentCategory.FAIL,
+        "fail.unsubscribed",
+        EvidenceStrength.NEGATIVE,
+        "Opt-out or unsubscribe signal was observed.",
+    ),
+    MessageClass.UNKNOWN_RETURN_MESSAGE: EvidenceMapping(
+        "notification.endpoint.unknown",
+        AssessmentCategory.UNDEF,
+        "undef.conflicting_evidence",
+        EvidenceStrength.AMBIGUOUS,
+        "Return-channel message could not be classified reliably.",
+    ),
+    MessageClass.CHALLENGE_RESPONSE: EvidenceMapping(
+        "interaction.unverified_actor_interaction",
+        AssessmentCategory.UNDEF,
+        "undef.identity_uncertain",
+        EvidenceStrength.AMBIGUOUS,
+        "Challenge-response or automated interaction was observed.",
+    ),
+    MessageClass.UNRELATED_MESSAGE: None,
+    MessageClass.PARSE_FAILED: None,
+}
+
+
+def evidence_deduplication_key(parsed: ParsedMailboxMessage, mapping: EvidenceMapping) -> str:
+    parts = [
+        parsed.mailbox_message_id,
+        parsed.parser_version,
+        mapping.event_type,
+        parsed.affected_email_address or "",
+        parsed.original_message_id or "",
+        parsed.smtp_status_code or "",
+        parsed.enhanced_status_code or "",
+        parsed.reason_code or "",
+    ]
+    return "|".join(parts)
+
+
+def candidate_from_parsed(
+    parsed: ParsedMailboxMessage,
+    *,
+    raw_message_ref: str | None,
+    observed_at: datetime,
+    occurred_at: datetime | None,
+) -> EmailEvidenceCandidate | None:
+    mapping = EVIDENCE_MAPPINGS.get(parsed.message_class)
+    if mapping is None:
+        return None
+
+    dedup_key = evidence_deduplication_key(parsed, mapping)
+    candidate_id = str(uuid5(NAMESPACE_URL, "email-connect:evidence:" + dedup_key))
+    return EmailEvidenceCandidate(
+        evidence_candidate_id=candidate_id,
+        mailbox_message_id=parsed.mailbox_message_id,
+        parsed_message_id=parsed.parsed_message_id,
+        event_type=mapping.event_type,
+        assessment_category=mapping.assessment_category,
+        assessment_subclass=mapping.assessment_subclass,
+        affected_email_address=parsed.affected_email_address,
+        original_message_id=parsed.original_message_id,
+        confidence=parsed.confidence,
+        evidence_strength=mapping.evidence_strength,
+        occurred_at=occurred_at,
+        observed_at=observed_at,
+        deduplication_key=dedup_key,
+        raw_message_ref=raw_message_ref,
+        notes=[mapping.normalized_meaning, *parsed.notes],
+        metadata={
+            "message_class": parsed.message_class.value,
+            "smtp_status_code": parsed.smtp_status_code,
+            "enhanced_status_code": parsed.enhanced_status_code,
+            "reason_code": parsed.reason_code,
+        },
+    )
--- a/src/email_connect/models.py
+++ b/src/email_connect/models.py
@@ -0,0 +1,126 @@
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from datetime import datetime
+from enum import StrEnum
+from typing import Any
+
+
+class MessageClass(StrEnum):
+    HARD_BOUNCE = "hard_bounce"
+    SOFT_BOUNCE = "soft_bounce"
+    DELAYED_DELIVERY_NOTICE = "delayed_delivery_notice"
+    FINAL_DELIVERY_FAILURE = "final_delivery_failure"
+    OUT_OF_OFFICE = "out_of_office"
+    HUMAN_REPLY = "human_reply"
+    COMPLAINT_OR_ABUSE = "complaint_or_abuse"
+    UNSUBSCRIBE_OR_OPT_OUT = "unsubscribe_or_opt_out"
+    CHALLENGE_RESPONSE = "challenge_response"
+    UNKNOWN_RETURN_MESSAGE = "unknown_return_message"
+    UNRELATED_MESSAGE = "unrelated_message"
+    PARSE_FAILED = "parse_failed"
+
+
+class AssessmentCategory(StrEnum):
+    SUCCESS = "success"
+    FAIL = "fail"
+    UNDEF = "undef"
+
+
+class Confidence(StrEnum):
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+
+
+class EvidenceStrength(StrEnum):
+    NONE = "none"
+    WEAK = "weak"
+    MEDIUM = "medium"
+    STRONG = "strong"
+    NEGATIVE = "negative"
+    AMBIGUOUS = "ambiguous"
+
+
+@dataclass(frozen=True)
+class MailboxScan:
+    scan_id: str
+    mailbox_id: str
+    started_at: datetime
+    finished_at: datetime | None
+    scan_mode: str
+    folder: str
+    status: str
+    messages_seen: int = 0
+    messages_new: int = 0
+    messages_parsed: int = 0
+    evidence_events_created: int = 0
+    report_path: str | None = None
+    since: datetime | None = None
+
+
+@dataclass(frozen=True)
+class InboundMailboxMessage:
+    mailbox_message_id: str
+    mailbox_id: str
+    imap_uid: str | None
+    message_id_header: str | None
+    received_at: datetime | None
+    from_address: str | None
+    to_addresses: list[str]
+    subject: str | None
+    raw_headers_ref: str | None
+    raw_message_ref: str | None
+    first_seen_at: datetime
+    last_seen_at: datetime
+    deduplication_key: str
+
+
+@dataclass(frozen=True)
+class ParsedMailboxMessage:
+    parsed_message_id: str
+    mailbox_message_id: str
+    parser_version: str
+    message_class: MessageClass
+    affected_email_address: str | None
+    original_message_id: str | None
+    original_recipient: str | None
+    smtp_status_code: str | None
+    enhanced_status_code: str | None
+    reason_code: str | None
+    confidence: Confidence
+    parsed_at: datetime
+    notes: list[str] = field(default_factory=list)
+
+
+@dataclass(frozen=True)
+class EmailEvidenceCandidate:
+    evidence_candidate_id: str
+    mailbox_message_id: str
+    parsed_message_id: str
+    event_type: str
+    assessment_category: AssessmentCategory
+    assessment_subclass: str
+    affected_email_address: str | None
+    original_message_id: str | None
+    confidence: Confidence
+    evidence_strength: EvidenceStrength
+    occurred_at: datetime | None
+    observed_at: datetime
+    deduplication_key: str
+    raw_message_ref: str | None
+    notes: list[str] = field(default_factory=list)
+    metadata: dict[str, Any] = field(default_factory=dict)
+
+
+@dataclass(frozen=True)
+class EndpointQualityUpdate:
+    affected_email_address: str
+    reachability: str = "unknown"
+    suppression_state: str = "unknown"
+    last_success_at: datetime | None = None
+    last_failure_at: datetime | None = None
+    last_auto_reply_at: datetime | None = None
+    reason_code: str | None = None
+    confidence: Confidence = Confidence.LOW
+    quality_signals: list[str] = field(default_factory=list)
--- a/src/email_connect/parser.py
+++ b/src/email_connect/parser.py
@@ -0,0 +1,320 @@
+from __future__ import annotations
+
+import hashlib
+import re
+from datetime import UTC, datetime
+from email import policy
+from email.parser import BytesParser
+from email.utils import getaddresses, parsedate_to_datetime
+from pathlib import Path
+from uuid import NAMESPACE_URL, uuid5
+
+from .evidence import candidate_from_parsed
+from .models import (
+    Confidence,
+    EmailEvidenceCandidate,
+    InboundMailboxMessage,
+    MessageClass,
+    ParsedMailboxMessage,
+)
+
+PARSER_VERSION = "mailbox-scanner-v0.1"
+
+EMAIL_RE = re.compile(r"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}", re.IGNORECASE)
+ENHANCED_STATUS_RE = re.compile(r"\b[245]\.\d{1,3}\.\d{1,3}\b")
+SMTP_STATUS_RE = re.compile(r"\b[245]\d{2}\b")
+
+
+def parse_message_bytes(
+    raw_bytes: bytes,
+    *,
+    mailbox_id: str,
+    raw_message_ref: str | None,
+    imap_uid: str | None = None,
+    now: datetime | None = None,
+) -> tuple[InboundMailboxMessage, ParsedMailboxMessage, EmailEvidenceCandidate | None]:
+    observed_at = now or datetime.now(UTC)
+    msg = BytesParser(policy=policy.default).parsebytes(raw_bytes)
+
+    message_id = _clean_header(msg.get("Message-ID"))
+    subject = _clean_header(msg.get("Subject"))
+    from_address = _first_address(msg.get("From"))
+    to_addresses = [addr for _, addr in getaddresses(msg.get_all("To", []))]
+    received_at = _parse_date(msg.get("Date"))
+    text = _extract_text(msg)
+    headers_text = "\n".join(f"{key}: {value}" for key, value in msg.items())
+    combined = f"{headers_text}\n\n{subject or ''}\n{text}"
+    dedup_key = _message_dedup_key(
+        mailbox_id=mailbox_id,
+        imap_uid=imap_uid,
+        message_id=message_id,
+        received_at=received_at,
+        from_address=from_address,
+        subject=subject,
+        body=text,
+    )
+    mailbox_message_id = str(uuid5(NAMESPACE_URL, "email-connect:message:" + dedup_key))
+    inbound = InboundMailboxMessage(
+        mailbox_message_id=mailbox_message_id,
+        mailbox_id=mailbox_id,
+        imap_uid=imap_uid,
+        message_id_header=message_id,
+        received_at=received_at,
+        from_address=from_address,
+        to_addresses=to_addresses,
+        subject=subject,
+        raw_headers_ref=raw_message_ref,
+        raw_message_ref=raw_message_ref,
+        first_seen_at=observed_at,
+        last_seen_at=observed_at,
+        deduplication_key=dedup_key,
+    )
+
+    parsed = classify_message(inbound, combined, observed_at=observed_at)
+    candidate = candidate_from_parsed(
+        parsed,
+        raw_message_ref=raw_message_ref,
+        observed_at=observed_at,
+        occurred_at=received_at,
+    )
+    return inbound, parsed, candidate
+
+
+def parse_message_file(
+    path: str | Path,
+    *,
+    mailbox_id: str,
+    now: datetime | None = None,
+) -> tuple[InboundMailboxMessage, ParsedMailboxMessage, EmailEvidenceCandidate | None]:
+    file_path = Path(path)
+    return parse_message_bytes(
+        file_path.read_bytes(),
+        mailbox_id=mailbox_id,
+        raw_message_ref=str(file_path),
+        now=now,
+    )
+
+
+def classify_message(
+    inbound: InboundMailboxMessage,
+    combined_text: str,
+    *,
+    observed_at: datetime,
+) -> ParsedMailboxMessage:
+    text = combined_text.lower()
+    enhanced_status = _first_match(ENHANCED_STATUS_RE, combined_text)
+    smtp_status = _first_match(SMTP_STATUS_RE, combined_text)
+    affected = _extract_affected_recipient(combined_text)
+    original_message_id = _extract_headerish(combined_text, "Original-Message-ID")
+    notes: list[str] = []
+
+    message_class = MessageClass.UNRELATED_MESSAGE
+    confidence = Confidence.LOW
+    reason_code: str | None = None
+
+    if _contains_any(text, ["feedback loop", "abuse report", "spam complaint", "complaint notification"]):
+        message_class = MessageClass.COMPLAINT_OR_ABUSE
+        confidence = Confidence.HIGH
+        reason_code = "complaint"
+    elif _contains_any(text, ["unsubscribe", "opt-out", "opt out", "remove me"]):
+        message_class = MessageClass.UNSUBSCRIBE_OR_OPT_OUT
+        confidence = Confidence.MEDIUM
+        reason_code = "unsubscribe"
+    elif _is_out_of_office(text):
+        message_class = MessageClass.OUT_OF_OFFICE
+        confidence = Confidence.MEDIUM
+        reason_code = "auto_reply"
+    elif _contains_any(text, ["will keep trying", "delivery delayed", "message delayed", "not yet delivered"]):
+        message_class = MessageClass.DELAYED_DELIVERY_NOTICE
+        confidence = Confidence.HIGH
+        reason_code = "delayed"
+    elif _is_dsn_like(text):
+        message_class, confidence, reason_code = _classify_dsn(text, smtp_status, enhanced_status)
+    elif _looks_like_human_reply(inbound, text):
+        message_class = MessageClass.HUMAN_REPLY
+        confidence = Confidence.MEDIUM
+        reason_code = "reply"
+    elif _looks_return_related(text):
+        message_class = MessageClass.UNKNOWN_RETURN_MESSAGE
+        confidence = Confidence.LOW
+        reason_code = "unknown_return"
+        notes.append("Return-channel message did not match a reliable classifier.")
+
+    if enhanced_status:
+        notes.append(f"enhanced_status={enhanced_status}")
+    if smtp_status:
+        notes.append(f"smtp_status={smtp_status}")
+
+    parsed_id_basis = "|".join([inbound.mailbox_message_id, PARSER_VERSION, message_class.value])
+    return ParsedMailboxMessage(
+        parsed_message_id=str(uuid5(NAMESPACE_URL, "email-connect:parsed:" + parsed_id_basis)),
+        mailbox_message_id=inbound.mailbox_message_id,
+        parser_version=PARSER_VERSION,
+        message_class=message_class,
+        affected_email_address=affected,
+        original_message_id=original_message_id,
+        original_recipient=affected,
+        smtp_status_code=smtp_status,
+        enhanced_status_code=enhanced_status,
+        reason_code=reason_code,
+        confidence=confidence,
+        parsed_at=observed_at,
+        notes=notes,
+    )
+
+
+def _classify_dsn(
+    text: str,
+    smtp_status: str | None,
+    enhanced_status: str | None,
+) -> tuple[MessageClass, Confidence, str]:
+    if _contains_any(text, ["final failure", "giving up", "could not deliver after"]):
+        return MessageClass.FINAL_DELIVERY_FAILURE, Confidence.HIGH, "expired_without_delivery"
+    if (enhanced_status and enhanced_status.startswith("5.")) or (smtp_status and smtp_status.startswith("5")):
+        return MessageClass.HARD_BOUNCE, Confidence.HIGH, "hard_bounce"
+    if _contains_any(text, ["unknown user", "mailbox not found", "user unknown", "domain not found"]):
+        return MessageClass.HARD_BOUNCE, Confidence.HIGH, "hard_bounce"
+    if (enhanced_status and enhanced_status.startswith("4.")) or (smtp_status and smtp_status.startswith("4")):
+        return MessageClass.SOFT_BOUNCE, Confidence.HIGH, "soft_bounce"
+    if _contains_any(text, ["temporary failure", "mailbox full", "greylist", "try again later"]):
+        return MessageClass.SOFT_BOUNCE, Confidence.MEDIUM, "soft_bounce"
+    return MessageClass.UNKNOWN_RETURN_MESSAGE, Confidence.LOW, "unknown_dsn"
+
+
+def _extract_text(msg) -> str:
+    chunks: list[str] = []
+    if msg.is_multipart():
+        for part in msg.walk():
+            if part.get_content_maintype() == "multipart":
+                continue
+            if part.get_content_type() in {"text/plain", "message/delivery-status"}:
+                try:
+                    chunks.append(str(part.get_content()))
+                except Exception:
+                    continue
+    else:
+        try:
+            chunks.append(str(msg.get_content()))
+        except Exception:
+            payload = msg.get_payload(decode=True)
+            if payload:
+                chunks.append(payload.decode(errors="replace"))
+    return "\n".join(chunks)
+
+
+def _message_dedup_key(
+    *,
+    mailbox_id: str,
+    imap_uid: str | None,
+    message_id: str | None,
+    received_at: datetime | None,
+    from_address: str | None,
+    subject: str | None,
+    body: str,
+) -> str:
+    body_hash = hashlib.sha256(body.encode("utf-8", errors="replace")).hexdigest()[:16]
+    return "|".join(
+        [
+            mailbox_id,
+            imap_uid or "",
+            message_id or "",
+            received_at.isoformat() if received_at else "",
+            from_address or "",
+            hashlib.sha256((subject or "").encode()).hexdigest()[:12],
+            body_hash,
+        ]
+    )
+
+
+def _clean_header(value: str | None) -> str | None:
+    if value is None:
+        return None
+    cleaned = str(value).strip()
+    return cleaned or None
+
+
+def _first_address(value: str | None) -> str | None:
+    if not value:
+        return None
+    addresses = getaddresses([value])
+    return addresses[0][1] if addresses else None
+
+
+def _parse_date(value: str | None) -> datetime | None:
+    if not value:
+        return None
+    try:
+        parsed = parsedate_to_datetime(value)
+    except (TypeError, ValueError):
+        return None
+    if parsed.tzinfo is None:
+        return parsed.replace(tzinfo=UTC)
+    return parsed.astimezone(UTC)
+
+
+def _first_match(pattern: re.Pattern[str], text: str) -> str | None:
+    match = pattern.search(text)
+    return match.group(0) if match else None
+
+
+def _extract_headerish(text: str, name: str) -> str | None:
+    match = re.search(rf"^{re.escape(name)}:\s*(.+)$", text, re.IGNORECASE | re.MULTILINE)
+    return match.group(1).strip() if match else None
+
+
+def _extract_affected_recipient(text: str) -> str | None:
+    for name in ["Final-Recipient", "Original-Recipient", "X-Failed-Recipients", "Failed-Recipient"]:
+        value = _extract_headerish(text, name)
+        if value:
+            match = EMAIL_RE.search(value)
+            if match:
+                return match.group(0).lower()
+    match = EMAIL_RE.search(text)
+    return match.group(0).lower() if match else None
+
+
+def _contains_any(text: str, needles: list[str]) -> bool:
+    return any(needle in text for needle in needles)
+
+
+def _is_dsn_like(text: str) -> bool:
+    return _contains_any(
+        text,
+        [
+            "delivery status notification",
+            "delivery failure",
+            "undeliverable",
+            "returned mail",
+            "mail delivery subsystem",
+            "final-recipient",
+            "diagnostic-code",
+            "status:",
+        ],
+    )
+
+
+def _is_out_of_office(text: str) -> bool:
+    return _contains_any(
+        text,
+        [
+            "out of office",
+            "auto-reply",
+            "autoreply",
+            "automatic reply",
+            "vacation",
+            "abwesenheitsnotiz",
+            "nicht im buero",
+            "nicht im büro",
+        ],
+    )
+
+
+def _looks_like_human_reply(inbound: InboundMailboxMessage, text: str) -> bool:
+    subject = (inbound.subject or "").lower()
+    if _contains_any(text, ["auto-submitted: auto-replied", "x-autoreply", "auto-generated"]):
+        return False
+    return subject.startswith("re:") or _contains_any(text, ["thanks", "thank you", "i confirm", "received"])
+
+
+def _looks_return_related(text: str) -> bool:
+    return _contains_any(text, ["delivery", "mailbox", "recipient", "message", "smtp", "unsubscribe", "reply"])
--- a/src/email_connect/reporting.py
+++ b/src/email_connect/reporting.py
@@ -0,0 +1,107 @@
+from __future__ import annotations
+
+import csv
+import json
+from datetime import UTC, datetime
+from pathlib import Path
+
+REPORT_COLUMNS = [
+    "report_generated_at",
+    "scan_id",
+    "mailbox_id",
+    "mailbox_message_id",
+    "mailbox_received_at",
+    "source_from",
+    "source_to",
+    "source_subject",
+    "message_id_header",
+    "detected_message_class",
+    "normalized_event_type",
+    "assessment_category",
+    "assessment_subclass",
+    "affected_email_address",
+    "original_message_id",
+    "original_recipient",
+    "smtp_status_code",
+    "enhanced_status_code",
+    "reason_code",
+    "confidence",
+    "evidence_strength",
+    "occurred_at",
+    "observed_at",
+    "first_seen_at",
+    "last_seen_at",
+    "deduplication_key",
+    "raw_message_ref",
+    "notes",
+]
+
+
+def report_filename(now: datetime | None = None) -> str:
+    stamp = (now or datetime.now(UTC)).strftime("%Y%m%d-%H%M%S")
+    return f"email-channel-evidence-report-{stamp}.csv"
+
+
+def write_evidence_report(
+    rows: list[dict],
+    *,
+    output_dir: str | Path,
+    scan_id: str,
+    mailbox_id: str,
+    generated_at: datetime | None = None,
+) -> Path:
+    generated = generated_at or datetime.now(UTC)
+    out_dir = Path(output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    path = out_dir / report_filename(generated)
+
+    with path.open("w", newline="", encoding="utf-8") as fh:
+        writer = csv.DictWriter(fh, fieldnames=REPORT_COLUMNS)
+        writer.writeheader()
+        for row in rows:
+            writer.writerow(_report_row(row, scan_id=scan_id, mailbox_id=mailbox_id, generated_at=generated))
+    return path
+
+
+def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datetime) -> dict:
+    metadata = _json(row.get("metadata_json"))
+    notes = _json(row.get("notes_json"))
+    return {
+        "report_generated_at": generated_at.isoformat(),
+        "scan_id": scan_id,
+        "mailbox_id": mailbox_id,
+        "mailbox_message_id": row.get("mailbox_message_id", ""),
+        "mailbox_received_at": row.get("occurred_at") or "",
+        "source_from": metadata.get("source_from", ""),
+        "source_to": metadata.get("source_to", ""),
+        "source_subject": metadata.get("source_subject", ""),
+        "message_id_header": metadata.get("message_id_header", ""),
+        "detected_message_class": metadata.get("message_class", ""),
+        "normalized_event_type": row.get("event_type", ""),
+        "assessment_category": row.get("assessment_category", ""),
+        "assessment_subclass": row.get("assessment_subclass", ""),
+        "affected_email_address": row.get("affected_email_address") or "",
+        "original_message_id": row.get("original_message_id") or "",
+        "original_recipient": metadata.get("original_recipient", ""),
+        "smtp_status_code": metadata.get("smtp_status_code") or "",
+        "enhanced_status_code": metadata.get("enhanced_status_code") or "",
+        "reason_code": metadata.get("reason_code") or "",
+        "confidence": row.get("confidence", ""),
+        "evidence_strength": row.get("evidence_strength", ""),
+        "occurred_at": row.get("occurred_at") or "",
+        "observed_at": row.get("observed_at") or "",
+        "first_seen_at": metadata.get("first_seen_at", ""),
+        "last_seen_at": metadata.get("last_seen_at", ""),
+        "deduplication_key": row.get("deduplication_key", ""),
+        "raw_message_ref": row.get("raw_message_ref") or "",
+        "notes": "; ".join(str(item) for item in notes),
+    }
+
+
+def _json(value: str | None) -> dict | list:
+    if not value:
+        return {}
+    try:
+        return json.loads(value)
+    except json.JSONDecodeError:
+        return {}
--- a/src/email_connect/scanner.py
+++ b/src/email_connect/scanner.py
@@ -0,0 +1,107 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+from datetime import UTC, datetime
+from pathlib import Path
+from uuid import uuid4
+
+from .config import AppConfig
+from .models import MailboxScan
+from .parser import parse_message_file
+from .reporting import write_evidence_report
+from .storage import StateStore
+
+
+@dataclass(frozen=True)
+class ScanResult:
+    scan: MailboxScan
+    report_path: Path | None
+
+
+def scan_mailbox(
+    config: AppConfig,
+    *,
+    output_dir: str | None = None,
+    full_rescan: bool = False,
+    report_only_new: bool = False,
+    dry_run: bool = False,
+    fixture_dir: str | None = None,
+) -> ScanResult:
+    started_at = datetime.now(UTC)
+    scan_id = str(uuid4())
+    source_dir = fixture_dir or config.source.fixture_dir
+    if not source_dir:
+        raise ValueError("No source.fixture_dir configured yet. IMAP connector is planned in EMAIL-WP-0002-T02.")
+
+    files = sorted(Path(source_dir).glob("*.eml"))
+    if config.scan.max_messages_per_run:
+        files = files[: config.scan.max_messages_per_run]
+
+    store = StateStore(config.storage.path)
+    messages_seen = 0
+    messages_new = 0
+    messages_parsed = 0
+    evidence_created = 0
+    try:
+        for path in files:
+            messages_seen += 1
+            inbound, parsed, candidate = parse_message_file(path, mailbox_id=config.mailbox.id)
+            if dry_run:
+                messages_parsed += 1
+                continue
+            is_new = store.upsert_message(inbound)
+            messages_new += int(is_new)
+            store.insert_parsed(parsed)
+            messages_parsed += 1
+            if candidate is not None:
+                candidate = _enrich_candidate(candidate, inbound, parsed)
+                evidence_created += int(store.insert_evidence(candidate))
+
+        report_path = None
+        if not dry_run:
+            report_rows = store.evidence_rows()
+            report_path = write_evidence_report(
+                report_rows,
+                output_dir=output_dir or config.reports.output_dir,
+                scan_id=scan_id,
+                mailbox_id=config.mailbox.id,
+            )
+        finished_at = datetime.now(UTC)
+        scan = MailboxScan(
+            scan_id=scan_id,
+            mailbox_id=config.mailbox.id,
+            started_at=started_at,
+            finished_at=finished_at,
+            scan_mode="full_rescan" if full_rescan else config.scan.mode,
+            folder=config.mailbox.folder,
+            status="completed",
+            messages_seen=messages_seen,
+            messages_new=messages_new,
+            messages_parsed=messages_parsed,
+            evidence_events_created=evidence_created,
+            report_path=str(report_path) if report_path else None,
+        )
+        if not dry_run:
+            store.insert_scan(scan)
+        return ScanResult(scan=scan, report_path=report_path)
+    finally:
+        store.close()
+
+
+def _enrich_candidate(candidate, inbound, parsed):
+    metadata = {
+        **candidate.metadata,
+        "source_from": inbound.from_address,
+        "source_to": ", ".join(inbound.to_addresses),
+        "source_subject": inbound.subject,
+        "message_id_header": inbound.message_id_header,
+        "original_recipient": parsed.original_recipient,
+        "first_seen_at": inbound.first_seen_at.isoformat(),
+        "last_seen_at": inbound.last_seen_at.isoformat(),
+    }
+    return type(candidate)(
+        **{
+            **candidate.__dict__,
+            "metadata": metadata,
+        }
+    )
--- a/src/email_connect/storage.py
+++ b/src/email_connect/storage.py
@@ -0,0 +1,212 @@
+from __future__ import annotations
+
+import json
+import sqlite3
+from datetime import UTC, datetime
+from pathlib import Path
+
+from .models import EmailEvidenceCandidate, InboundMailboxMessage, MailboxScan, ParsedMailboxMessage
+
+
+class StateStore:
+    def __init__(self, path: str | Path) -> None:
+        self.path = Path(path)
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+        self.conn = sqlite3.connect(self.path)
+        self.conn.row_factory = sqlite3.Row
+        self.init_schema()
+
+    def close(self) -> None:
+        self.conn.close()
+
+    def init_schema(self) -> None:
+        self.conn.executescript(
+            """
+            create table if not exists mailbox_scans (
+              scan_id text primary key,
+              mailbox_id text not null,
+              started_at text not null,
+              finished_at text,
+              scan_mode text not null,
+              folder text not null,
+              status text not null,
+              messages_seen integer not null,
+              messages_new integer not null,
+              messages_parsed integer not null,
+              evidence_events_created integer not null,
+              report_path text,
+              since text
+            );
+
+            create table if not exists mailbox_messages (
+              mailbox_message_id text primary key,
+              mailbox_id text not null,
+              imap_uid text,
+              message_id_header text,
+              received_at text,
+              from_address text,
+              to_addresses_json text not null,
+              subject text,
+              raw_headers_ref text,
+              raw_message_ref text,
+              first_seen_at text not null,
+              last_seen_at text not null,
+              deduplication_key text not null unique
+            );
+
+            create table if not exists parsed_messages (
+              parsed_message_id text primary key,
+              mailbox_message_id text not null,
+              parser_version text not null,
+              message_class text not null,
+              affected_email_address text,
+              original_message_id text,
+              original_recipient text,
+              smtp_status_code text,
+              enhanced_status_code text,
+              reason_code text,
+              confidence text not null,
+              parsed_at text not null,
+              notes_json text not null
+            );
+
+            create table if not exists evidence_candidates (
+              evidence_candidate_id text primary key,
+              mailbox_message_id text not null,
+              parsed_message_id text not null,
+              event_type text not null,
+              assessment_category text not null,
+              assessment_subclass text not null,
+              affected_email_address text,
+              original_message_id text,
+              confidence text not null,
+              evidence_strength text not null,
+              occurred_at text,
+              observed_at text not null,
+              deduplication_key text not null unique,
+              raw_message_ref text,
+              notes_json text not null,
+              metadata_json text not null
+            );
+            """
+        )
+        self.conn.commit()
+
+    def upsert_message(self, message: InboundMailboxMessage) -> bool:
+        existing = self.conn.execute(
+            "select mailbox_message_id from mailbox_messages where deduplication_key = ?",
+            (message.deduplication_key,),
+        ).fetchone()
+        if existing:
+            self.conn.execute(
+                "update mailbox_messages set last_seen_at = ? where deduplication_key = ?",
+                (_dt(message.last_seen_at), message.deduplication_key),
+            )
+            self.conn.commit()
+            return False
+        self.conn.execute(
+            """
+            insert into mailbox_messages values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                message.mailbox_message_id,
+                message.mailbox_id,
+                message.imap_uid,
+                message.message_id_header,
+                _dt(message.received_at),
+                message.from_address,
+                json.dumps(message.to_addresses),
+                message.subject,
+                message.raw_headers_ref,
+                message.raw_message_ref,
+                _dt(message.first_seen_at),
+                _dt(message.last_seen_at),
+                message.deduplication_key,
+            ),
+        )
+        self.conn.commit()
+        return True
+
+    def insert_parsed(self, parsed: ParsedMailboxMessage) -> None:
+        self.conn.execute(
+            """
+            insert or replace into parsed_messages values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                parsed.parsed_message_id,
+                parsed.mailbox_message_id,
+                parsed.parser_version,
+                parsed.message_class.value,
+                parsed.affected_email_address,
+                parsed.original_message_id,
+                parsed.original_recipient,
+                parsed.smtp_status_code,
+                parsed.enhanced_status_code,
+                parsed.reason_code,
+                parsed.confidence.value,
+                _dt(parsed.parsed_at),
+                json.dumps(parsed.notes),
+            ),
+        )
+        self.conn.commit()
+
+    def insert_evidence(self, candidate: EmailEvidenceCandidate) -> bool:
+        try:
+            self.conn.execute(
+                """
+                insert into evidence_candidates values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                """,
+                (
+                    candidate.evidence_candidate_id,
+                    candidate.mailbox_message_id,
+                    candidate.parsed_message_id,
+                    candidate.event_type,
+                    candidate.assessment_category.value,
+                    candidate.assessment_subclass,
+                    candidate.affected_email_address,
+                    candidate.original_message_id,
+                    candidate.confidence.value,
+                    candidate.evidence_strength.value,
+                    _dt(candidate.occurred_at),
+                    _dt(candidate.observed_at),
+                    candidate.deduplication_key,
+                    candidate.raw_message_ref,
+                    json.dumps(candidate.notes),
+                    json.dumps(candidate.metadata),
+                ),
+            )
+        except sqlite3.IntegrityError:
+            return False
+        self.conn.commit()
+        return True
+
+    def insert_scan(self, scan: MailboxScan) -> None:
+        self.conn.execute(
+            """
+            insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+            """,
+            (
+                scan.scan_id,
+                scan.mailbox_id,
+                _dt(scan.started_at),
+                _dt(scan.finished_at),
+                scan.scan_mode,
+                scan.folder,
+                scan.status,
+                scan.messages_seen,
+                scan.messages_new,
+                scan.messages_parsed,
+                scan.evidence_events_created,
+                scan.report_path,
+                _dt(scan.since),
+            ),
+        )
+        self.conn.commit()
+
+    def evidence_rows(self) -> list[dict]:
+        rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall()
+        return [dict(row) for row in rows]
+
+
+def _dt(value: datetime | None) -> str | None:
+    return value.isoformat() if value else None
--- a/tests/fixtures/mailbox/hard_bounce.eml
+++ b/tests/fixtures/mailbox/hard_bounce.eml
@@ -0,0 +1,12 @@
+From: Mail Delivery Subsystem <mailer-daemon@example.net>
+To: sender@example.com
+Subject: Delivery Status Notification (Failure)
+Date: Tue, 02 Jun 2026 10:00:00 +0000
+Message-ID: <hard-bounce@example.net>
+Content-Type: text/plain; charset=utf-8
+
+Delivery failure.
+Final-Recipient: rfc822; missing@example.com
+Action: failed
+Status: 5.1.1
+Diagnostic-Code: smtp; 550 5.1.1 User unknown
--- a/tests/fixtures/mailbox/human_reply.eml
+++ b/tests/fixtures/mailbox/human_reply.eml
@@ -0,0 +1,8 @@
+From: Recipient <recipient@example.com>
+To: sender@example.com
+Subject: Re: Your notification
+Date: Tue, 02 Jun 2026 10:03:00 +0000
+Message-ID: <reply@example.com>
+Content-Type: text/plain; charset=utf-8
+
+Thanks, I received this and will review it today.
--- a/tests/fixtures/mailbox/out_of_office.eml
+++ b/tests/fixtures/mailbox/out_of_office.eml
@@ -0,0 +1,9 @@
+From: Recipient <recipient@example.com>
+To: sender@example.com
+Subject: Auto-reply: Out of office
+Date: Tue, 02 Jun 2026 10:02:00 +0000
+Message-ID: <ooo@example.com>
+Auto-Submitted: auto-replied
+Content-Type: text/plain; charset=utf-8
+
+I am out of office until next week.
--- a/tests/fixtures/mailbox/soft_bounce.eml
+++ b/tests/fixtures/mailbox/soft_bounce.eml
@@ -0,0 +1,12 @@
+From: Mail Delivery Subsystem <mailer-daemon@example.net>
+To: sender@example.com
+Subject: Delivery temporarily delayed
+Date: Tue, 02 Jun 2026 10:01:00 +0000
+Message-ID: <soft-bounce@example.net>
+Content-Type: text/plain; charset=utf-8
+
+Delivery Status Notification.
+Final-Recipient: rfc822; full@example.com
+Action: delayed
+Status: 4.2.2
+Diagnostic-Code: smtp; 452 4.2.2 Mailbox full
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -0,0 +1,43 @@
+from __future__ import annotations
+
+import unittest
+from pathlib import Path
+
+from email_connect.models import MessageClass
+from email_connect.parser import parse_message_file
+
+
+FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
+
+
+class ParserTests(unittest.TestCase):
+    def test_hard_bounce_maps_to_permanent_rejection(self) -> None:
+        _inbound, parsed, candidate = parse_message_file(FIXTURES / "hard_bounce.eml", mailbox_id="test")
+        self.assertEqual(parsed.message_class, MessageClass.HARD_BOUNCE)
+        self.assertIsNotNone(candidate)
+        self.assertEqual(candidate.event_type, "notification.endpoint.rejected_permanent")
+        self.assertEqual(candidate.assessment_subclass, "fail.hard_bounce")
+
+    def test_soft_bounce_maps_to_temporary_rejection(self) -> None:
+        _inbound, parsed, candidate = parse_message_file(FIXTURES / "soft_bounce.eml", mailbox_id="test")
+        self.assertEqual(parsed.message_class, MessageClass.SOFT_BOUNCE)
+        self.assertIsNotNone(candidate)
+        self.assertEqual(candidate.event_type, "notification.endpoint.rejected_temporary")
+
+    def test_out_of_office_stays_undef(self) -> None:
+        _inbound, parsed, candidate = parse_message_file(FIXTURES / "out_of_office.eml", mailbox_id="test")
+        self.assertEqual(parsed.message_class, MessageClass.OUT_OF_OFFICE)
+        self.assertIsNotNone(candidate)
+        self.assertEqual(candidate.assessment_category.value, "undef")
+        self.assertEqual(candidate.event_type, "interaction.out_of_office_received")
+
+    def test_human_reply_is_email_channel_success_only(self) -> None:
+        _inbound, parsed, candidate = parse_message_file(FIXTURES / "human_reply.eml", mailbox_id="test")
+        self.assertEqual(parsed.message_class, MessageClass.HUMAN_REPLY)
+        self.assertIsNotNone(candidate)
+        self.assertEqual(candidate.event_type, "interaction.reply_received")
+        self.assertEqual(candidate.assessment_subclass, "success.reply_received")
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_scanner.py
+++ b/tests/test_scanner.py
@@ -0,0 +1,37 @@
+from __future__ import annotations
+
+import tempfile
+import unittest
+from pathlib import Path
+
+from email_connect.config import AppConfig, MailboxConfig, ReportsConfig, ScanConfig, SourceConfig, StorageConfig
+from email_connect.scanner import scan_mailbox
+
+
+FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
+
+
+class ScannerTests(unittest.TestCase):
+    def test_scan_fixture_directory_writes_report_and_deduplicates(self) -> None:
+        with tempfile.TemporaryDirectory() as tmp:
+            root = Path(tmp)
+            config = AppConfig(
+                mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
+                scan=ScanConfig(),
+                storage=StorageConfig(path=str(root / "state.sqlite")),
+                reports=ReportsConfig(output_dir=str(root / "reports")),
+                source=SourceConfig(fixture_dir=str(FIXTURES)),
+            )
+            first = scan_mailbox(config)
+            second = scan_mailbox(config)
+
+            self.assertEqual(first.scan.messages_seen, 4)
+            self.assertEqual(first.scan.messages_new, 4)
+            self.assertGreaterEqual(first.scan.evidence_events_created, 4)
+            self.assertEqual(second.scan.messages_new, 0)
+            self.assertEqual(second.scan.evidence_events_created, 0)
+            self.assertTrue(first.report_path and first.report_path.exists())
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/workplans/EMAIL-WP-0001-repo-onboarding.md
+++ b/workplans/EMAIL-WP-0001-repo-onboarding.md
@@ -4,7 +4,7 @@ type: workplan
 title: "Repository Onboarding and Implementation Foundation"
 domain: custodian
 repo: email-connect
-status: active
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-02"
@@ -55,7 +55,7 @@ Done when:

 ```task
 id: EMAIL-WP-0001-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "fdfd8b96-7326-414f-8126-79bb3a21b950"
 ```
@@ -76,7 +76,7 @@ The architecture note should cover:

 ```task
 id: EMAIL-WP-0001-T03
-status: todo
+status: done
 priority: high
 state_hub_task_id: "ef1eb769-dfa0-4b46-8633-274d90962423"
 ```
@@ -105,7 +105,7 @@ click telemetry as proof of human awareness or result satisfaction.

 ```task
 id: EMAIL-WP-0001-T04
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "4b94e544-5aad-4c38-8fe3-eed17af79971"
 ```
--- a/workplans/EMAIL-WP-0002-mvp-mailbox-evidence-scanner.md
+++ b/workplans/EMAIL-WP-0002-mvp-mailbox-evidence-scanner.md
@@ -4,7 +4,7 @@ type: workplan
 title: "MVP Mailbox Evidence Scanner"
 domain: custodian
 repo: email-connect
-status: ready
+status: active
 owner: codex
 topic_slug: custodian
 created: "2026-06-02"
@@ -688,7 +688,7 @@ Endpoint quality is diagnostic and must not be treated as coordination success.

 ```task
 id: EMAIL-WP-0002-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "3a17215d-62a9-48ef-877f-a6fbc7e95a22"
 ```
@@ -716,7 +716,7 @@ Config file is loaded and validated.

 ```task
 id: EMAIL-WP-0002-T02
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "25a4da12-1bcd-4c6d-a0eb-a2f525b9c4b9"
 ```
@@ -744,7 +744,7 @@ CLI can connect to mailbox and list/fetch messages without modifying mailbox.

 ```task
 id: EMAIL-WP-0002-T03
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "16b95a6b-1375-4c91-8b78-0b75d51e0aeb"
 ```
@@ -773,7 +773,7 @@ Full rescan can revisit all messages while preserving deduplication.

 ```task
 id: EMAIL-WP-0002-T04
-status: todo
+status: done
 priority: high
 state_hub_task_id: "5a50cd85-b0ab-4017-aba0-b2087068abb4"
 ```
@@ -802,7 +802,7 @@ Scanner extracts basic metadata and text from representative bounce and reply me

 ```task
 id: EMAIL-WP-0002-T05
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "8ea826d1-0add-4573-9bb4-2b73adefba55"
 ```
@@ -831,7 +831,7 @@ Representative hard and soft bounce samples are classified correctly.

 ```task
 id: EMAIL-WP-0002-T06
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "4d94a332-173b-4787-8fb2-27aa63db6a8d"
 ```
@@ -880,7 +880,7 @@ Representative complaint and unsubscribe examples are classified.

 ```task
 id: EMAIL-WP-0002-T08
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "6d62dea0-f416-4c0b-80a0-7c16422b8e5f"
 ```
@@ -932,7 +932,7 @@ Complaint/unsubscribe updates suppression state.

 ```task
 id: EMAIL-WP-0002-T10
-status: todo
+status: progress
 priority: medium
 state_hub_task_id: "5ab35176-d6c2-4c73-b7b3-bde4c097e3ee"
 ```
@@ -959,7 +959,7 @@ Report can be opened in spreadsheet tools.

 ```task
 id: EMAIL-WP-0002-T11
-status: todo
+status: progress
 priority: high
 state_hub_task_id: "514fa099-781b-4590-aae4-c28970413b3f"
 ```
@@ -989,7 +989,7 @@ Automated tests verify expected classification and normalized event output.

 ```task
 id: EMAIL-WP-0002-T12
-status: todo
+status: progress
 priority: medium
 state_hub_task_id: "a5f7067e-87be-4438-ba35-b12d06a8181e"
 ```