generated from coulomb/repo-seed
feat: start mailbox evidence scanner
This commit is contained in:
5
.gitignore
vendored
5
.gitignore
vendored
@@ -171,6 +171,9 @@ cython_debug/
|
||||
# Ruff stuff:
|
||||
.ruff_cache/
|
||||
|
||||
# email-connect local runtime artifacts
|
||||
.email-connect/
|
||||
reports/*.csv
|
||||
|
||||
# PyPI configuration file
|
||||
.pypirc
|
||||
|
||||
|
||||
35
README.md
35
README.md
@@ -1,3 +1,34 @@
|
||||
# repo-seed
|
||||
# email-connect
|
||||
|
||||
A git repository template to bootstrap coulomb projects from.
|
||||
Headless, provider-neutral email communication and evidence service.
|
||||
|
||||
The first implementation slice is the Mailbox Evidence Scanner MVP: scan a
|
||||
return mailbox or fixture directory, classify inbound email-channel evidence,
|
||||
store scan state locally, and generate timestamped CSV reports without
|
||||
overclaiming delivery, awareness, or coordination success.
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
PYTHONPATH=src python3 -m unittest discover -s tests
|
||||
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
|
||||
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
|
||||
```
|
||||
|
||||
The example config uses `tests/fixtures/mailbox` as a mailbox source. Runtime
|
||||
state is written to `.email-connect/state.sqlite`; generated CSV reports are
|
||||
written to `reports/`.
|
||||
|
||||
## Current Scope
|
||||
|
||||
- Coordination-engine spec review and references.
|
||||
- Initial adapter descriptor, capability profile, evidence ceiling, and
|
||||
limitations.
|
||||
- Conservative mailbox message parser and evidence mapper.
|
||||
- SQLite state store with message and evidence deduplication.
|
||||
- CSV report generation.
|
||||
- Golden fixture tests for hard bounce, soft bounce, out-of-office, and human
|
||||
reply signals.
|
||||
|
||||
IMAP access, provider webhooks, outbound sending, suppression workflows, and a
|
||||
UI are planned work, not complete in this first slice.
|
||||
|
||||
31
config/mailbox.example.yml
Normal file
31
config/mailbox.example.yml
Normal file
@@ -0,0 +1,31 @@
|
||||
mailbox:
|
||||
id: return-mailbox-default
|
||||
protocol: fixture
|
||||
host: null
|
||||
port: 993
|
||||
tls: true
|
||||
username_env: EMAIL_CONNECT_IMAP_USER
|
||||
password_env: EMAIL_CONNECT_IMAP_PASSWORD
|
||||
folder: INBOX
|
||||
|
||||
source:
|
||||
fixture_dir: tests/fixtures/mailbox
|
||||
|
||||
scan:
|
||||
mode: incremental
|
||||
max_messages_per_run: 5000
|
||||
since: null
|
||||
include_seen: true
|
||||
mark_seen: false
|
||||
store_raw_headers: true
|
||||
store_raw_body: false
|
||||
store_raw_message_ref: true
|
||||
|
||||
storage:
|
||||
path: .email-connect/state.sqlite
|
||||
|
||||
reports:
|
||||
output_dir: reports
|
||||
include_all_evidence: true
|
||||
include_unknown_messages: true
|
||||
timestamp_timezone: UTC
|
||||
46
docs/coordination-engine-email-spec-review.md
Normal file
46
docs/coordination-engine-email-spec-review.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# coordination-engine Email Spec Review
|
||||
|
||||
## Source Specs
|
||||
|
||||
Reviewed local coordination-engine specifications:
|
||||
|
||||
- `/home/worsch/coordination-engine/spec/EmailAdapterSpecification.md`
|
||||
- `/home/worsch/coordination-engine/spec/AdapterInterfaceSpecification.md`
|
||||
- `/home/worsch/coordination-engine/spec/RuntimeArchitectureAndAdapterSubsystem.md`
|
||||
- `/home/worsch/coordination-engine/spec/ProductRequirementsDocument.md`
|
||||
|
||||
These are referenced, not copied, so `email-connect` stays aligned with the
|
||||
authoritative coordination-engine checkout.
|
||||
|
||||
## Contract Points For email-connect
|
||||
|
||||
- `email-connect` is a `notification`, `communication`, and `interaction`
|
||||
adapter.
|
||||
- It reports email-channel evidence and advisory assessment. It does not own
|
||||
coordination case result evaluation.
|
||||
- Ambiguous provider or mailbox signals must map to the weakest safe normalized
|
||||
event.
|
||||
- Provider and mailbox events must preserve raw references and native status
|
||||
mappings.
|
||||
- The adapter must expose or enable production of normalized `EvidenceEvent`
|
||||
records.
|
||||
- The adapter should expose endpoint quality diagnostics, but endpoint quality
|
||||
is not coordination success.
|
||||
- Email's positive evidence ceiling is limited: email tracking cannot prove
|
||||
human awareness, identity, authority, payload access, or non-repudiation.
|
||||
- Golden tests must include overclaim prevention, especially "provider
|
||||
delivered" not becoming awareness or payload delivery.
|
||||
|
||||
## First Implementation Implications
|
||||
|
||||
- The mailbox scanner emits `EmailEvidenceCandidate` rows that are shaped to
|
||||
become coordination-engine `EvidenceEvent` records later.
|
||||
- Classifiers preserve `unknown_return_message` and `parse_failed` instead of
|
||||
silently discarding uncertainty.
|
||||
- Out-of-office replies update diagnostics only; they do not prove awareness or
|
||||
reachability.
|
||||
- Human replies are email-channel success signals, but not legal acceptance or
|
||||
coordination result satisfaction.
|
||||
- CSV reports include event type, assessment category/subclass, confidence,
|
||||
evidence strength, observed time, occurred time, raw reference, and
|
||||
deduplication key for auditability.
|
||||
48
docs/email-evidence-canon.md
Normal file
48
docs/email-evidence-canon.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Email Evidence Canon
|
||||
|
||||
## Rule
|
||||
|
||||
Email events are evidence, not result satisfaction.
|
||||
|
||||
The scanner reports email-channel facts and uncertainty. A downstream
|
||||
coordination runtime decides whether those facts satisfy a coordination case.
|
||||
|
||||
## Initial Message Classes
|
||||
|
||||
- `hard_bounce`
|
||||
- `soft_bounce`
|
||||
- `delayed_delivery_notice`
|
||||
- `final_delivery_failure`
|
||||
- `out_of_office`
|
||||
- `human_reply`
|
||||
- `complaint_or_abuse`
|
||||
- `unsubscribe_or_opt_out`
|
||||
- `challenge_response`
|
||||
- `unknown_return_message`
|
||||
- `unrelated_message`
|
||||
- `parse_failed`
|
||||
|
||||
## Initial Normalized Events
|
||||
|
||||
| Message class | Normalized event | Assessment |
|
||||
| --- | --- | --- |
|
||||
| `hard_bounce` | `notification.endpoint.rejected_permanent` | `fail.hard_bounce` |
|
||||
| `soft_bounce` | `notification.endpoint.rejected_temporary` | `undef.deferred` |
|
||||
| `delayed_delivery_notice` | `notification.endpoint.deferred` | `undef.deferred` |
|
||||
| `final_delivery_failure` | `notification.endpoint.rejected_permanent` | `fail.expired_without_delivery` |
|
||||
| `out_of_office` | `interaction.out_of_office_received` | `undef.out_of_office` |
|
||||
| `human_reply` | `interaction.reply_received` | `success.reply_received` |
|
||||
| `complaint_or_abuse` | `notification.channel.complaint_received` | `fail.complaint_received` |
|
||||
| `unsubscribe_or_opt_out` | `notification.channel.unsubscribe_received` | `fail.unsubscribed` |
|
||||
| `unknown_return_message` | `notification.endpoint.unknown` | `undef.conflicting_evidence` |
|
||||
| `challenge_response` | `interaction.unverified_actor_interaction` | `undef.identity_uncertain` |
|
||||
|
||||
## Overclaim Prevention
|
||||
|
||||
- No bounce found does not mean delivery success.
|
||||
- Provider acceptance does not mean endpoint acceptance.
|
||||
- Endpoint acceptance does not mean inbox placement.
|
||||
- Out-of-office does not prove recipient awareness or action.
|
||||
- Human reply does not prove legal acceptance.
|
||||
- Unknown return messages remain visible.
|
||||
- Scanner and proxy interactions must stay below identity-bound interaction.
|
||||
98
docs/initial-runtime-architecture.md
Normal file
98
docs/initial-runtime-architecture.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# Initial Runtime Architecture
|
||||
|
||||
## Status
|
||||
|
||||
This is the first implementation architecture for the mailbox evidence scanner
|
||||
slice. It is intentionally small and stdlib-only so the repo can run before a
|
||||
larger service stack is chosen.
|
||||
|
||||
## Service Boundary
|
||||
|
||||
The first slice is a CLI scanner:
|
||||
|
||||
```text
|
||||
email-connect scan-mailbox --config config/mailbox.example.yml --out reports/
|
||||
```
|
||||
|
||||
It scans an inbound return mailbox source, classifies messages, stores scan
|
||||
state in SQLite, and writes timestamped CSV evidence reports.
|
||||
|
||||
The initial source implementation supports fixture directories. The IMAP
|
||||
connector remains the next mailbox boundary to complete under
|
||||
`EMAIL-WP-0002-T02`.
|
||||
|
||||
## Package Layout
|
||||
|
||||
```text
|
||||
src/email_connect/
|
||||
adapter_contract.py # coordination-engine descriptor and evidence ceiling
|
||||
cli.py # command line entry points
|
||||
config.py # config loading
|
||||
evidence.py # native class to normalized evidence mapping
|
||||
models.py # mailbox, parse, evidence, endpoint quality dataclasses
|
||||
parser.py # MIME/header parsing and conservative classification
|
||||
reporting.py # CSV report generation
|
||||
scanner.py # scan orchestration
|
||||
storage.py # SQLite state store
|
||||
```
|
||||
|
||||
## Persistence
|
||||
|
||||
SQLite is the MVP store. The initial schema includes:
|
||||
|
||||
- `mailbox_scans`
|
||||
- `mailbox_messages`
|
||||
- `parsed_messages`
|
||||
- `evidence_candidates`
|
||||
|
||||
Message deduplication is keyed by mailbox ID, IMAP UID when present, message ID,
|
||||
received timestamp, sender, subject hash, and body hash. Evidence
|
||||
deduplication follows the workplan fields: message, parser version, normalized
|
||||
event, affected recipient, original message, SMTP/enhanced status, and reason.
|
||||
|
||||
## Evidence Mapping
|
||||
|
||||
Parser output is represented as `ParsedMailboxMessage`. The mapper converts it
|
||||
to `EmailEvidenceCandidate` using coordination-engine event names and advisory
|
||||
assessment classes.
|
||||
|
||||
Examples:
|
||||
|
||||
- `hard_bounce` -> `notification.endpoint.rejected_permanent`
|
||||
- `soft_bounce` -> `notification.endpoint.rejected_temporary`
|
||||
- `out_of_office` -> `interaction.out_of_office_received`
|
||||
- `human_reply` -> `interaction.reply_received`
|
||||
|
||||
The mapper does not emit evidence for unrelated messages. Unknown return
|
||||
messages stay visible as `notification.endpoint.unknown`.
|
||||
|
||||
## coordination-engine Alignment
|
||||
|
||||
The implementation keeps these coordination-engine concepts explicit:
|
||||
|
||||
- adapter descriptor
|
||||
- adapter capability profile
|
||||
- evidence ceiling
|
||||
- advisory assessment
|
||||
- endpoint quality update shape
|
||||
- event observation and raw reference preservation
|
||||
- golden tests for overclaim prevention
|
||||
|
||||
Email evidence remains below the coordination result layer. The scanner does
|
||||
not infer inbox placement, human awareness, legal acceptance, payload access, or
|
||||
case success.
|
||||
|
||||
## Provider Boundary
|
||||
|
||||
Provider webhook ingestion and outbound send APIs are deliberately outside this
|
||||
slice. The mailbox scanner uses the same evidence model so future provider
|
||||
events can enter through a parallel ingestion path and converge at the same
|
||||
normalization layer.
|
||||
|
||||
## Development Commands
|
||||
|
||||
```bash
|
||||
PYTHONPATH=src python3 -m unittest discover -s tests
|
||||
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
|
||||
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
|
||||
```
|
||||
19
pyproject.toml
Normal file
19
pyproject.toml
Normal file
@@ -0,0 +1,19 @@
|
||||
[build-system]
|
||||
requires = ["setuptools>=68"]
|
||||
build-backend = "setuptools.build_meta"
|
||||
|
||||
[project]
|
||||
name = "email-connect"
|
||||
version = "0.1.0"
|
||||
description = "Provider-neutral email communication and evidence service."
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
license = { file = "LICENSE" }
|
||||
authors = [{ name = "Coulomb" }]
|
||||
dependencies = []
|
||||
|
||||
[project.scripts]
|
||||
email-connect = "email_connect.cli:main"
|
||||
|
||||
[tool.setuptools.packages.find]
|
||||
where = ["src"]
|
||||
1
reports/.gitkeep
Normal file
1
reports/.gitkeep
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
3
src/email_connect/__init__.py
Normal file
3
src/email_connect/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
"""email-connect package."""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
96
src/email_connect/adapter_contract.py
Normal file
96
src/email_connect/adapter_contract.py
Normal file
@@ -0,0 +1,96 @@
|
||||
from __future__ import annotations
|
||||
|
||||
ADAPTER_ID = "email-connect"
|
||||
ADAPTER_CONTRACT_VERSION = "1.1"
|
||||
|
||||
EMITTED_EVENT_TYPES = [
|
||||
"notification.endpoint.rejected_permanent",
|
||||
"notification.endpoint.rejected_temporary",
|
||||
"notification.endpoint.deferred",
|
||||
"notification.channel.complaint_received",
|
||||
"notification.channel.unsubscribe_received",
|
||||
"interaction.reply_received",
|
||||
"interaction.out_of_office_received",
|
||||
"notification.endpoint.unknown",
|
||||
]
|
||||
|
||||
|
||||
def adapter_descriptor() -> dict:
|
||||
"""Return the initial coordination-engine adapter descriptor."""
|
||||
|
||||
return {
|
||||
"adapter_id": ADAPTER_ID,
|
||||
"adapter_name": "email-connect",
|
||||
"adapter_version": "0.1.0",
|
||||
"adapter_contract_version": ADAPTER_CONTRACT_VERSION,
|
||||
"adapter_types": ["notification", "communication", "interaction"],
|
||||
"deployment_mode": "external",
|
||||
"capability_profile": {
|
||||
"can_notify": True,
|
||||
"can_publish": False,
|
||||
"can_deliver_payload": False,
|
||||
"can_collect_payload": False,
|
||||
"can_grant_access": False,
|
||||
"can_revoke_access": False,
|
||||
"can_observe_interaction": True,
|
||||
"can_observe_identity": False,
|
||||
"can_observe_authority": False,
|
||||
"can_emit_delivery_receipts": False,
|
||||
"can_emit_display_or_read_receipts": False,
|
||||
"can_emit_return_or_failure_events": True,
|
||||
"can_emit_late_events": True,
|
||||
"can_cancel_after_dispatch": False,
|
||||
"supports_webhooks": False,
|
||||
"supports_polling": True,
|
||||
"supports_idempotency": "adapter_managed",
|
||||
"supports_endpoint_quality": True,
|
||||
"supports_native_status_mapping": True,
|
||||
"supports_golden_tests": True,
|
||||
"limitations": [
|
||||
"Mailbox evidence cannot prove inbox placement.",
|
||||
"Reply, open, and click signals do not prove coordination result satisfaction.",
|
||||
],
|
||||
},
|
||||
"supported_actions": [],
|
||||
"emitted_event_types": EMITTED_EVENT_TYPES,
|
||||
"supported_channels": ["email"],
|
||||
"supported_endpoint_types": ["email_address"],
|
||||
"evidence_profile": {
|
||||
"native_status_mapping_ref": "src/email_connect/evidence.py",
|
||||
"golden_tests_ref": "tests/fixtures",
|
||||
},
|
||||
"evidence_ceiling": {
|
||||
"max_positive_event": "interaction.reply_received",
|
||||
"max_positive_strength": "medium",
|
||||
"can_prove_human_awareness": False,
|
||||
"can_prove_payload_access": False,
|
||||
"can_prove_payload_delivery": False,
|
||||
"can_prove_identity": False,
|
||||
"can_prove_authority": False,
|
||||
"can_prove_non_repudiation": False,
|
||||
"limitations": [
|
||||
"Email cannot reliably prove intended-recipient awareness.",
|
||||
"Mailbox scanning observes return-channel evidence only.",
|
||||
],
|
||||
},
|
||||
"assurance_capability": {
|
||||
"awareness_assurance": "weak",
|
||||
"delivery_assurance": "none",
|
||||
"identity_assurance": "none",
|
||||
"authority_assurance": "none",
|
||||
"non_repudiation_assurance": "none",
|
||||
},
|
||||
"identity_profile": {
|
||||
"can_identify_actor": False,
|
||||
"identity_limitations": ["Inbound mailbox evidence is not authenticated actor evidence."],
|
||||
},
|
||||
"late_event_policy": {
|
||||
"accepts_late_events": True,
|
||||
"late_event_notes": ["Bounces and replies may arrive after earlier acceptance evidence."],
|
||||
},
|
||||
"limitations": [
|
||||
"No outbound sending in the first mailbox-scanner MVP.",
|
||||
"No provider webhook integration in the first mailbox-scanner MVP.",
|
||||
"No legal delivery assessment.",
|
||||
],
|
||||
}
|
||||
55
src/email_connect/cli.py
Normal file
55
src/email_connect/cli.py
Normal file
@@ -0,0 +1,55 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from .adapter_contract import adapter_descriptor
|
||||
from .config import load_config
|
||||
from .scanner import scan_mailbox
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = argparse.ArgumentParser(prog="email-connect")
|
||||
sub = parser.add_subparsers(dest="command", required=True)
|
||||
|
||||
scan = sub.add_parser("scan-mailbox", help="Scan a return mailbox or fixture directory.")
|
||||
scan.add_argument("--config", required=True)
|
||||
scan.add_argument("--out", default=None)
|
||||
scan.add_argument("--full-rescan", action="store_true")
|
||||
scan.add_argument("--since", default=None)
|
||||
scan.add_argument("--report-only-new", action="store_true")
|
||||
scan.add_argument("--dry-run", action="store_true")
|
||||
scan.add_argument("--fixture-dir", default=None)
|
||||
|
||||
sub.add_parser("adapter-descriptor", help="Print the initial coordination-engine adapter descriptor.")
|
||||
|
||||
args = parser.parse_args(argv)
|
||||
if args.command == "adapter-descriptor":
|
||||
print(json.dumps(adapter_descriptor(), indent=2, sort_keys=True))
|
||||
return 0
|
||||
|
||||
if args.command == "scan-mailbox":
|
||||
config = load_config(args.config)
|
||||
result = scan_mailbox(
|
||||
config,
|
||||
output_dir=args.out,
|
||||
full_rescan=args.full_rescan,
|
||||
report_only_new=args.report_only_new,
|
||||
dry_run=args.dry_run,
|
||||
fixture_dir=args.fixture_dir,
|
||||
)
|
||||
print(f"scan_id={result.scan.scan_id}")
|
||||
print(f"messages_seen={result.scan.messages_seen}")
|
||||
print(f"messages_new={result.scan.messages_new}")
|
||||
print(f"messages_parsed={result.scan.messages_parsed}")
|
||||
print(f"evidence_events_created={result.scan.evidence_events_created}")
|
||||
if result.report_path:
|
||||
print(f"report_path={Path(result.report_path)}")
|
||||
return 0
|
||||
|
||||
return 2
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
140
src/email_connect/config.py
Normal file
140
src/email_connect/config.py
Normal file
@@ -0,0 +1,140 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MailboxConfig:
|
||||
id: str
|
||||
protocol: str = "imap"
|
||||
host: str | None = None
|
||||
port: int = 993
|
||||
tls: bool = True
|
||||
username_env: str | None = None
|
||||
password_env: str | None = None
|
||||
folder: str = "INBOX"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScanConfig:
|
||||
mode: str = "incremental"
|
||||
max_messages_per_run: int = 5000
|
||||
since: str | None = None
|
||||
include_seen: bool = True
|
||||
mark_seen: bool = False
|
||||
store_raw_headers: bool = True
|
||||
store_raw_body: bool = False
|
||||
store_raw_message_ref: bool = True
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class StorageConfig:
|
||||
path: str = ".email-connect/state.sqlite"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReportsConfig:
|
||||
output_dir: str = "reports"
|
||||
include_all_evidence: bool = True
|
||||
include_unknown_messages: bool = True
|
||||
timestamp_timezone: str = "UTC"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SourceConfig:
|
||||
fixture_dir: str | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AppConfig:
|
||||
mailbox: MailboxConfig
|
||||
scan: ScanConfig
|
||||
storage: StorageConfig
|
||||
reports: ReportsConfig
|
||||
source: SourceConfig = SourceConfig()
|
||||
|
||||
|
||||
def load_config(path: str | Path) -> AppConfig:
|
||||
data = _load_mapping(Path(path))
|
||||
mailbox = data.get("mailbox", {})
|
||||
scan = data.get("scan", {})
|
||||
storage = data.get("storage", {})
|
||||
reports = data.get("reports", {})
|
||||
source = data.get("source", {})
|
||||
return AppConfig(
|
||||
mailbox=MailboxConfig(
|
||||
id=str(mailbox.get("id", "return-mailbox-default")),
|
||||
protocol=str(mailbox.get("protocol", "imap")),
|
||||
host=mailbox.get("host"),
|
||||
port=int(mailbox.get("port", 993)),
|
||||
tls=bool(mailbox.get("tls", True)),
|
||||
username_env=mailbox.get("username_env"),
|
||||
password_env=mailbox.get("password_env"),
|
||||
folder=str(mailbox.get("folder", "INBOX")),
|
||||
),
|
||||
scan=ScanConfig(
|
||||
mode=str(scan.get("mode", "incremental")),
|
||||
max_messages_per_run=int(scan.get("max_messages_per_run", 5000)),
|
||||
since=scan.get("since"),
|
||||
include_seen=bool(scan.get("include_seen", True)),
|
||||
mark_seen=bool(scan.get("mark_seen", False)),
|
||||
store_raw_headers=bool(scan.get("store_raw_headers", True)),
|
||||
store_raw_body=bool(scan.get("store_raw_body", False)),
|
||||
store_raw_message_ref=bool(scan.get("store_raw_message_ref", True)),
|
||||
),
|
||||
storage=StorageConfig(path=str(storage.get("path", ".email-connect/state.sqlite"))),
|
||||
reports=ReportsConfig(
|
||||
output_dir=str(reports.get("output_dir", "reports")),
|
||||
include_all_evidence=bool(reports.get("include_all_evidence", True)),
|
||||
include_unknown_messages=bool(reports.get("include_unknown_messages", True)),
|
||||
timestamp_timezone=str(reports.get("timestamp_timezone", "UTC")),
|
||||
),
|
||||
source=SourceConfig(fixture_dir=source.get("fixture_dir")),
|
||||
)
|
||||
|
||||
|
||||
def _load_mapping(path: Path) -> dict[str, Any]:
|
||||
text = path.read_text(encoding="utf-8")
|
||||
try:
|
||||
import yaml # type: ignore
|
||||
|
||||
loaded = yaml.safe_load(text)
|
||||
return loaded or {}
|
||||
except ModuleNotFoundError:
|
||||
return _parse_simple_yaml(text)
|
||||
|
||||
|
||||
def _parse_simple_yaml(text: str) -> dict[str, Any]:
|
||||
"""Parse the small YAML subset used by config/mailbox.example.yml."""
|
||||
|
||||
result: dict[str, Any] = {}
|
||||
current: dict[str, Any] | None = None
|
||||
for raw_line in text.splitlines():
|
||||
line = raw_line.split("#", 1)[0].rstrip()
|
||||
if not line.strip():
|
||||
continue
|
||||
if not line.startswith(" ") and line.endswith(":"):
|
||||
key = line[:-1].strip()
|
||||
current = {}
|
||||
result[key] = current
|
||||
continue
|
||||
if current is None or ":" not in line:
|
||||
raise ValueError(f"Unsupported YAML line: {raw_line}")
|
||||
key, value = line.strip().split(":", 1)
|
||||
current[key.strip()] = _parse_scalar(value.strip())
|
||||
return result
|
||||
|
||||
|
||||
def _parse_scalar(value: str) -> Any:
|
||||
if value == "" or value == "null":
|
||||
return None
|
||||
if value in {"true", "false"}:
|
||||
return value == "true"
|
||||
if value.startswith('"') and value.endswith('"'):
|
||||
return value[1:-1]
|
||||
try:
|
||||
return int(value)
|
||||
except ValueError:
|
||||
return value
|
||||
151
src/email_connect/evidence.py
Normal file
151
src/email_connect/evidence.py
Normal file
@@ -0,0 +1,151 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
from uuid import uuid5, NAMESPACE_URL
|
||||
|
||||
from .models import (
|
||||
AssessmentCategory,
|
||||
Confidence,
|
||||
EmailEvidenceCandidate,
|
||||
EvidenceStrength,
|
||||
MessageClass,
|
||||
ParsedMailboxMessage,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EvidenceMapping:
|
||||
event_type: str
|
||||
assessment_category: AssessmentCategory
|
||||
assessment_subclass: str
|
||||
evidence_strength: EvidenceStrength
|
||||
normalized_meaning: str
|
||||
|
||||
|
||||
EVIDENCE_MAPPINGS: dict[MessageClass, EvidenceMapping | None] = {
|
||||
MessageClass.HARD_BOUNCE: EvidenceMapping(
|
||||
"notification.endpoint.rejected_permanent",
|
||||
AssessmentCategory.FAIL,
|
||||
"fail.hard_bounce",
|
||||
EvidenceStrength.NEGATIVE,
|
||||
"Strong evidence that the recipient endpoint rejected the message permanently.",
|
||||
),
|
||||
MessageClass.SOFT_BOUNCE: EvidenceMapping(
|
||||
"notification.endpoint.rejected_temporary",
|
||||
AssessmentCategory.UNDEF,
|
||||
"undef.deferred",
|
||||
EvidenceStrength.AMBIGUOUS,
|
||||
"Temporary endpoint failure; retry or later evidence may change interpretation.",
|
||||
),
|
||||
MessageClass.DELAYED_DELIVERY_NOTICE: EvidenceMapping(
|
||||
"notification.endpoint.deferred",
|
||||
AssessmentCategory.UNDEF,
|
||||
"undef.deferred",
|
||||
EvidenceStrength.AMBIGUOUS,
|
||||
"Delivery is delayed and still uncertain.",
|
||||
),
|
||||
MessageClass.FINAL_DELIVERY_FAILURE: EvidenceMapping(
|
||||
"notification.endpoint.rejected_permanent",
|
||||
AssessmentCategory.FAIL,
|
||||
"fail.expired_without_delivery",
|
||||
EvidenceStrength.NEGATIVE,
|
||||
"Provider or endpoint gave up after retrying.",
|
||||
),
|
||||
MessageClass.OUT_OF_OFFICE: EvidenceMapping(
|
||||
"interaction.out_of_office_received",
|
||||
AssessmentCategory.UNDEF,
|
||||
"undef.out_of_office",
|
||||
EvidenceStrength.WEAK,
|
||||
"Mailbox auto-reply was observed; awareness and action remain uncertain.",
|
||||
),
|
||||
MessageClass.HUMAN_REPLY: EvidenceMapping(
|
||||
"interaction.reply_received",
|
||||
AssessmentCategory.SUCCESS,
|
||||
"success.reply_received",
|
||||
EvidenceStrength.MEDIUM,
|
||||
"A reply-like message was observed on the email channel.",
|
||||
),
|
||||
MessageClass.COMPLAINT_OR_ABUSE: EvidenceMapping(
|
||||
"notification.channel.complaint_received",
|
||||
AssessmentCategory.FAIL,
|
||||
"fail.complaint_received",
|
||||
EvidenceStrength.NEGATIVE,
|
||||
"Complaint or abuse feedback was observed.",
|
||||
),
|
||||
MessageClass.UNSUBSCRIBE_OR_OPT_OUT: EvidenceMapping(
|
||||
"notification.channel.unsubscribe_received",
|
||||
AssessmentCategory.FAIL,
|
||||
"fail.unsubscribed",
|
||||
EvidenceStrength.NEGATIVE,
|
||||
"Opt-out or unsubscribe signal was observed.",
|
||||
),
|
||||
MessageClass.UNKNOWN_RETURN_MESSAGE: EvidenceMapping(
|
||||
"notification.endpoint.unknown",
|
||||
AssessmentCategory.UNDEF,
|
||||
"undef.conflicting_evidence",
|
||||
EvidenceStrength.AMBIGUOUS,
|
||||
"Return-channel message could not be classified reliably.",
|
||||
),
|
||||
MessageClass.CHALLENGE_RESPONSE: EvidenceMapping(
|
||||
"interaction.unverified_actor_interaction",
|
||||
AssessmentCategory.UNDEF,
|
||||
"undef.identity_uncertain",
|
||||
EvidenceStrength.AMBIGUOUS,
|
||||
"Challenge-response or automated interaction was observed.",
|
||||
),
|
||||
MessageClass.UNRELATED_MESSAGE: None,
|
||||
MessageClass.PARSE_FAILED: None,
|
||||
}
|
||||
|
||||
|
||||
def evidence_deduplication_key(parsed: ParsedMailboxMessage, mapping: EvidenceMapping) -> str:
|
||||
parts = [
|
||||
parsed.mailbox_message_id,
|
||||
parsed.parser_version,
|
||||
mapping.event_type,
|
||||
parsed.affected_email_address or "",
|
||||
parsed.original_message_id or "",
|
||||
parsed.smtp_status_code or "",
|
||||
parsed.enhanced_status_code or "",
|
||||
parsed.reason_code or "",
|
||||
]
|
||||
return "|".join(parts)
|
||||
|
||||
|
||||
def candidate_from_parsed(
|
||||
parsed: ParsedMailboxMessage,
|
||||
*,
|
||||
raw_message_ref: str | None,
|
||||
observed_at: datetime,
|
||||
occurred_at: datetime | None,
|
||||
) -> EmailEvidenceCandidate | None:
|
||||
mapping = EVIDENCE_MAPPINGS.get(parsed.message_class)
|
||||
if mapping is None:
|
||||
return None
|
||||
|
||||
dedup_key = evidence_deduplication_key(parsed, mapping)
|
||||
candidate_id = str(uuid5(NAMESPACE_URL, "email-connect:evidence:" + dedup_key))
|
||||
return EmailEvidenceCandidate(
|
||||
evidence_candidate_id=candidate_id,
|
||||
mailbox_message_id=parsed.mailbox_message_id,
|
||||
parsed_message_id=parsed.parsed_message_id,
|
||||
event_type=mapping.event_type,
|
||||
assessment_category=mapping.assessment_category,
|
||||
assessment_subclass=mapping.assessment_subclass,
|
||||
affected_email_address=parsed.affected_email_address,
|
||||
original_message_id=parsed.original_message_id,
|
||||
confidence=parsed.confidence,
|
||||
evidence_strength=mapping.evidence_strength,
|
||||
occurred_at=occurred_at,
|
||||
observed_at=observed_at,
|
||||
deduplication_key=dedup_key,
|
||||
raw_message_ref=raw_message_ref,
|
||||
notes=[mapping.normalized_meaning, *parsed.notes],
|
||||
metadata={
|
||||
"message_class": parsed.message_class.value,
|
||||
"smtp_status_code": parsed.smtp_status_code,
|
||||
"enhanced_status_code": parsed.enhanced_status_code,
|
||||
"reason_code": parsed.reason_code,
|
||||
},
|
||||
)
|
||||
126
src/email_connect/models.py
Normal file
126
src/email_connect/models.py
Normal file
@@ -0,0 +1,126 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
|
||||
class MessageClass(StrEnum):
|
||||
HARD_BOUNCE = "hard_bounce"
|
||||
SOFT_BOUNCE = "soft_bounce"
|
||||
DELAYED_DELIVERY_NOTICE = "delayed_delivery_notice"
|
||||
FINAL_DELIVERY_FAILURE = "final_delivery_failure"
|
||||
OUT_OF_OFFICE = "out_of_office"
|
||||
HUMAN_REPLY = "human_reply"
|
||||
COMPLAINT_OR_ABUSE = "complaint_or_abuse"
|
||||
UNSUBSCRIBE_OR_OPT_OUT = "unsubscribe_or_opt_out"
|
||||
CHALLENGE_RESPONSE = "challenge_response"
|
||||
UNKNOWN_RETURN_MESSAGE = "unknown_return_message"
|
||||
UNRELATED_MESSAGE = "unrelated_message"
|
||||
PARSE_FAILED = "parse_failed"
|
||||
|
||||
|
||||
class AssessmentCategory(StrEnum):
|
||||
SUCCESS = "success"
|
||||
FAIL = "fail"
|
||||
UNDEF = "undef"
|
||||
|
||||
|
||||
class Confidence(StrEnum):
|
||||
LOW = "low"
|
||||
MEDIUM = "medium"
|
||||
HIGH = "high"
|
||||
|
||||
|
||||
class EvidenceStrength(StrEnum):
|
||||
NONE = "none"
|
||||
WEAK = "weak"
|
||||
MEDIUM = "medium"
|
||||
STRONG = "strong"
|
||||
NEGATIVE = "negative"
|
||||
AMBIGUOUS = "ambiguous"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MailboxScan:
|
||||
scan_id: str
|
||||
mailbox_id: str
|
||||
started_at: datetime
|
||||
finished_at: datetime | None
|
||||
scan_mode: str
|
||||
folder: str
|
||||
status: str
|
||||
messages_seen: int = 0
|
||||
messages_new: int = 0
|
||||
messages_parsed: int = 0
|
||||
evidence_events_created: int = 0
|
||||
report_path: str | None = None
|
||||
since: datetime | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InboundMailboxMessage:
|
||||
mailbox_message_id: str
|
||||
mailbox_id: str
|
||||
imap_uid: str | None
|
||||
message_id_header: str | None
|
||||
received_at: datetime | None
|
||||
from_address: str | None
|
||||
to_addresses: list[str]
|
||||
subject: str | None
|
||||
raw_headers_ref: str | None
|
||||
raw_message_ref: str | None
|
||||
first_seen_at: datetime
|
||||
last_seen_at: datetime
|
||||
deduplication_key: str
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ParsedMailboxMessage:
|
||||
parsed_message_id: str
|
||||
mailbox_message_id: str
|
||||
parser_version: str
|
||||
message_class: MessageClass
|
||||
affected_email_address: str | None
|
||||
original_message_id: str | None
|
||||
original_recipient: str | None
|
||||
smtp_status_code: str | None
|
||||
enhanced_status_code: str | None
|
||||
reason_code: str | None
|
||||
confidence: Confidence
|
||||
parsed_at: datetime
|
||||
notes: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EmailEvidenceCandidate:
|
||||
evidence_candidate_id: str
|
||||
mailbox_message_id: str
|
||||
parsed_message_id: str
|
||||
event_type: str
|
||||
assessment_category: AssessmentCategory
|
||||
assessment_subclass: str
|
||||
affected_email_address: str | None
|
||||
original_message_id: str | None
|
||||
confidence: Confidence
|
||||
evidence_strength: EvidenceStrength
|
||||
occurred_at: datetime | None
|
||||
observed_at: datetime
|
||||
deduplication_key: str
|
||||
raw_message_ref: str | None
|
||||
notes: list[str] = field(default_factory=list)
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EndpointQualityUpdate:
|
||||
affected_email_address: str
|
||||
reachability: str = "unknown"
|
||||
suppression_state: str = "unknown"
|
||||
last_success_at: datetime | None = None
|
||||
last_failure_at: datetime | None = None
|
||||
last_auto_reply_at: datetime | None = None
|
||||
reason_code: str | None = None
|
||||
confidence: Confidence = Confidence.LOW
|
||||
quality_signals: list[str] = field(default_factory=list)
|
||||
320
src/email_connect/parser.py
Normal file
320
src/email_connect/parser.py
Normal file
@@ -0,0 +1,320 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
from datetime import UTC, datetime
|
||||
from email import policy
|
||||
from email.parser import BytesParser
|
||||
from email.utils import getaddresses, parsedate_to_datetime
|
||||
from pathlib import Path
|
||||
from uuid import NAMESPACE_URL, uuid5
|
||||
|
||||
from .evidence import candidate_from_parsed
|
||||
from .models import (
|
||||
Confidence,
|
||||
EmailEvidenceCandidate,
|
||||
InboundMailboxMessage,
|
||||
MessageClass,
|
||||
ParsedMailboxMessage,
|
||||
)
|
||||
|
||||
PARSER_VERSION = "mailbox-scanner-v0.1"
|
||||
|
||||
EMAIL_RE = re.compile(r"[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}", re.IGNORECASE)
|
||||
ENHANCED_STATUS_RE = re.compile(r"\b[245]\.\d{1,3}\.\d{1,3}\b")
|
||||
SMTP_STATUS_RE = re.compile(r"\b[245]\d{2}\b")
|
||||
|
||||
|
||||
def parse_message_bytes(
|
||||
raw_bytes: bytes,
|
||||
*,
|
||||
mailbox_id: str,
|
||||
raw_message_ref: str | None,
|
||||
imap_uid: str | None = None,
|
||||
now: datetime | None = None,
|
||||
) -> tuple[InboundMailboxMessage, ParsedMailboxMessage, EmailEvidenceCandidate | None]:
|
||||
observed_at = now or datetime.now(UTC)
|
||||
msg = BytesParser(policy=policy.default).parsebytes(raw_bytes)
|
||||
|
||||
message_id = _clean_header(msg.get("Message-ID"))
|
||||
subject = _clean_header(msg.get("Subject"))
|
||||
from_address = _first_address(msg.get("From"))
|
||||
to_addresses = [addr for _, addr in getaddresses(msg.get_all("To", []))]
|
||||
received_at = _parse_date(msg.get("Date"))
|
||||
text = _extract_text(msg)
|
||||
headers_text = "\n".join(f"{key}: {value}" for key, value in msg.items())
|
||||
combined = f"{headers_text}\n\n{subject or ''}\n{text}"
|
||||
dedup_key = _message_dedup_key(
|
||||
mailbox_id=mailbox_id,
|
||||
imap_uid=imap_uid,
|
||||
message_id=message_id,
|
||||
received_at=received_at,
|
||||
from_address=from_address,
|
||||
subject=subject,
|
||||
body=text,
|
||||
)
|
||||
mailbox_message_id = str(uuid5(NAMESPACE_URL, "email-connect:message:" + dedup_key))
|
||||
inbound = InboundMailboxMessage(
|
||||
mailbox_message_id=mailbox_message_id,
|
||||
mailbox_id=mailbox_id,
|
||||
imap_uid=imap_uid,
|
||||
message_id_header=message_id,
|
||||
received_at=received_at,
|
||||
from_address=from_address,
|
||||
to_addresses=to_addresses,
|
||||
subject=subject,
|
||||
raw_headers_ref=raw_message_ref,
|
||||
raw_message_ref=raw_message_ref,
|
||||
first_seen_at=observed_at,
|
||||
last_seen_at=observed_at,
|
||||
deduplication_key=dedup_key,
|
||||
)
|
||||
|
||||
parsed = classify_message(inbound, combined, observed_at=observed_at)
|
||||
candidate = candidate_from_parsed(
|
||||
parsed,
|
||||
raw_message_ref=raw_message_ref,
|
||||
observed_at=observed_at,
|
||||
occurred_at=received_at,
|
||||
)
|
||||
return inbound, parsed, candidate
|
||||
|
||||
|
||||
def parse_message_file(
|
||||
path: str | Path,
|
||||
*,
|
||||
mailbox_id: str,
|
||||
now: datetime | None = None,
|
||||
) -> tuple[InboundMailboxMessage, ParsedMailboxMessage, EmailEvidenceCandidate | None]:
|
||||
file_path = Path(path)
|
||||
return parse_message_bytes(
|
||||
file_path.read_bytes(),
|
||||
mailbox_id=mailbox_id,
|
||||
raw_message_ref=str(file_path),
|
||||
now=now,
|
||||
)
|
||||
|
||||
|
||||
def classify_message(
|
||||
inbound: InboundMailboxMessage,
|
||||
combined_text: str,
|
||||
*,
|
||||
observed_at: datetime,
|
||||
) -> ParsedMailboxMessage:
|
||||
text = combined_text.lower()
|
||||
enhanced_status = _first_match(ENHANCED_STATUS_RE, combined_text)
|
||||
smtp_status = _first_match(SMTP_STATUS_RE, combined_text)
|
||||
affected = _extract_affected_recipient(combined_text)
|
||||
original_message_id = _extract_headerish(combined_text, "Original-Message-ID")
|
||||
notes: list[str] = []
|
||||
|
||||
message_class = MessageClass.UNRELATED_MESSAGE
|
||||
confidence = Confidence.LOW
|
||||
reason_code: str | None = None
|
||||
|
||||
if _contains_any(text, ["feedback loop", "abuse report", "spam complaint", "complaint notification"]):
|
||||
message_class = MessageClass.COMPLAINT_OR_ABUSE
|
||||
confidence = Confidence.HIGH
|
||||
reason_code = "complaint"
|
||||
elif _contains_any(text, ["unsubscribe", "opt-out", "opt out", "remove me"]):
|
||||
message_class = MessageClass.UNSUBSCRIBE_OR_OPT_OUT
|
||||
confidence = Confidence.MEDIUM
|
||||
reason_code = "unsubscribe"
|
||||
elif _is_out_of_office(text):
|
||||
message_class = MessageClass.OUT_OF_OFFICE
|
||||
confidence = Confidence.MEDIUM
|
||||
reason_code = "auto_reply"
|
||||
elif _contains_any(text, ["will keep trying", "delivery delayed", "message delayed", "not yet delivered"]):
|
||||
message_class = MessageClass.DELAYED_DELIVERY_NOTICE
|
||||
confidence = Confidence.HIGH
|
||||
reason_code = "delayed"
|
||||
elif _is_dsn_like(text):
|
||||
message_class, confidence, reason_code = _classify_dsn(text, smtp_status, enhanced_status)
|
||||
elif _looks_like_human_reply(inbound, text):
|
||||
message_class = MessageClass.HUMAN_REPLY
|
||||
confidence = Confidence.MEDIUM
|
||||
reason_code = "reply"
|
||||
elif _looks_return_related(text):
|
||||
message_class = MessageClass.UNKNOWN_RETURN_MESSAGE
|
||||
confidence = Confidence.LOW
|
||||
reason_code = "unknown_return"
|
||||
notes.append("Return-channel message did not match a reliable classifier.")
|
||||
|
||||
if enhanced_status:
|
||||
notes.append(f"enhanced_status={enhanced_status}")
|
||||
if smtp_status:
|
||||
notes.append(f"smtp_status={smtp_status}")
|
||||
|
||||
parsed_id_basis = "|".join([inbound.mailbox_message_id, PARSER_VERSION, message_class.value])
|
||||
return ParsedMailboxMessage(
|
||||
parsed_message_id=str(uuid5(NAMESPACE_URL, "email-connect:parsed:" + parsed_id_basis)),
|
||||
mailbox_message_id=inbound.mailbox_message_id,
|
||||
parser_version=PARSER_VERSION,
|
||||
message_class=message_class,
|
||||
affected_email_address=affected,
|
||||
original_message_id=original_message_id,
|
||||
original_recipient=affected,
|
||||
smtp_status_code=smtp_status,
|
||||
enhanced_status_code=enhanced_status,
|
||||
reason_code=reason_code,
|
||||
confidence=confidence,
|
||||
parsed_at=observed_at,
|
||||
notes=notes,
|
||||
)
|
||||
|
||||
|
||||
def _classify_dsn(
|
||||
text: str,
|
||||
smtp_status: str | None,
|
||||
enhanced_status: str | None,
|
||||
) -> tuple[MessageClass, Confidence, str]:
|
||||
if _contains_any(text, ["final failure", "giving up", "could not deliver after"]):
|
||||
return MessageClass.FINAL_DELIVERY_FAILURE, Confidence.HIGH, "expired_without_delivery"
|
||||
if (enhanced_status and enhanced_status.startswith("5.")) or (smtp_status and smtp_status.startswith("5")):
|
||||
return MessageClass.HARD_BOUNCE, Confidence.HIGH, "hard_bounce"
|
||||
if _contains_any(text, ["unknown user", "mailbox not found", "user unknown", "domain not found"]):
|
||||
return MessageClass.HARD_BOUNCE, Confidence.HIGH, "hard_bounce"
|
||||
if (enhanced_status and enhanced_status.startswith("4.")) or (smtp_status and smtp_status.startswith("4")):
|
||||
return MessageClass.SOFT_BOUNCE, Confidence.HIGH, "soft_bounce"
|
||||
if _contains_any(text, ["temporary failure", "mailbox full", "greylist", "try again later"]):
|
||||
return MessageClass.SOFT_BOUNCE, Confidence.MEDIUM, "soft_bounce"
|
||||
return MessageClass.UNKNOWN_RETURN_MESSAGE, Confidence.LOW, "unknown_dsn"
|
||||
|
||||
|
||||
def _extract_text(msg) -> str:
|
||||
chunks: list[str] = []
|
||||
if msg.is_multipart():
|
||||
for part in msg.walk():
|
||||
if part.get_content_maintype() == "multipart":
|
||||
continue
|
||||
if part.get_content_type() in {"text/plain", "message/delivery-status"}:
|
||||
try:
|
||||
chunks.append(str(part.get_content()))
|
||||
except Exception:
|
||||
continue
|
||||
else:
|
||||
try:
|
||||
chunks.append(str(msg.get_content()))
|
||||
except Exception:
|
||||
payload = msg.get_payload(decode=True)
|
||||
if payload:
|
||||
chunks.append(payload.decode(errors="replace"))
|
||||
return "\n".join(chunks)
|
||||
|
||||
|
||||
def _message_dedup_key(
|
||||
*,
|
||||
mailbox_id: str,
|
||||
imap_uid: str | None,
|
||||
message_id: str | None,
|
||||
received_at: datetime | None,
|
||||
from_address: str | None,
|
||||
subject: str | None,
|
||||
body: str,
|
||||
) -> str:
|
||||
body_hash = hashlib.sha256(body.encode("utf-8", errors="replace")).hexdigest()[:16]
|
||||
return "|".join(
|
||||
[
|
||||
mailbox_id,
|
||||
imap_uid or "",
|
||||
message_id or "",
|
||||
received_at.isoformat() if received_at else "",
|
||||
from_address or "",
|
||||
hashlib.sha256((subject or "").encode()).hexdigest()[:12],
|
||||
body_hash,
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
def _clean_header(value: str | None) -> str | None:
|
||||
if value is None:
|
||||
return None
|
||||
cleaned = str(value).strip()
|
||||
return cleaned or None
|
||||
|
||||
|
||||
def _first_address(value: str | None) -> str | None:
|
||||
if not value:
|
||||
return None
|
||||
addresses = getaddresses([value])
|
||||
return addresses[0][1] if addresses else None
|
||||
|
||||
|
||||
def _parse_date(value: str | None) -> datetime | None:
|
||||
if not value:
|
||||
return None
|
||||
try:
|
||||
parsed = parsedate_to_datetime(value)
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
if parsed.tzinfo is None:
|
||||
return parsed.replace(tzinfo=UTC)
|
||||
return parsed.astimezone(UTC)
|
||||
|
||||
|
||||
def _first_match(pattern: re.Pattern[str], text: str) -> str | None:
|
||||
match = pattern.search(text)
|
||||
return match.group(0) if match else None
|
||||
|
||||
|
||||
def _extract_headerish(text: str, name: str) -> str | None:
|
||||
match = re.search(rf"^{re.escape(name)}:\s*(.+)$", text, re.IGNORECASE | re.MULTILINE)
|
||||
return match.group(1).strip() if match else None
|
||||
|
||||
|
||||
def _extract_affected_recipient(text: str) -> str | None:
|
||||
for name in ["Final-Recipient", "Original-Recipient", "X-Failed-Recipients", "Failed-Recipient"]:
|
||||
value = _extract_headerish(text, name)
|
||||
if value:
|
||||
match = EMAIL_RE.search(value)
|
||||
if match:
|
||||
return match.group(0).lower()
|
||||
match = EMAIL_RE.search(text)
|
||||
return match.group(0).lower() if match else None
|
||||
|
||||
|
||||
def _contains_any(text: str, needles: list[str]) -> bool:
|
||||
return any(needle in text for needle in needles)
|
||||
|
||||
|
||||
def _is_dsn_like(text: str) -> bool:
|
||||
return _contains_any(
|
||||
text,
|
||||
[
|
||||
"delivery status notification",
|
||||
"delivery failure",
|
||||
"undeliverable",
|
||||
"returned mail",
|
||||
"mail delivery subsystem",
|
||||
"final-recipient",
|
||||
"diagnostic-code",
|
||||
"status:",
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def _is_out_of_office(text: str) -> bool:
|
||||
return _contains_any(
|
||||
text,
|
||||
[
|
||||
"out of office",
|
||||
"auto-reply",
|
||||
"autoreply",
|
||||
"automatic reply",
|
||||
"vacation",
|
||||
"abwesenheitsnotiz",
|
||||
"nicht im buero",
|
||||
"nicht im büro",
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def _looks_like_human_reply(inbound: InboundMailboxMessage, text: str) -> bool:
|
||||
subject = (inbound.subject or "").lower()
|
||||
if _contains_any(text, ["auto-submitted: auto-replied", "x-autoreply", "auto-generated"]):
|
||||
return False
|
||||
return subject.startswith("re:") or _contains_any(text, ["thanks", "thank you", "i confirm", "received"])
|
||||
|
||||
|
||||
def _looks_return_related(text: str) -> bool:
|
||||
return _contains_any(text, ["delivery", "mailbox", "recipient", "message", "smtp", "unsubscribe", "reply"])
|
||||
107
src/email_connect/reporting.py
Normal file
107
src/email_connect/reporting.py
Normal file
@@ -0,0 +1,107 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import csv
|
||||
import json
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
REPORT_COLUMNS = [
|
||||
"report_generated_at",
|
||||
"scan_id",
|
||||
"mailbox_id",
|
||||
"mailbox_message_id",
|
||||
"mailbox_received_at",
|
||||
"source_from",
|
||||
"source_to",
|
||||
"source_subject",
|
||||
"message_id_header",
|
||||
"detected_message_class",
|
||||
"normalized_event_type",
|
||||
"assessment_category",
|
||||
"assessment_subclass",
|
||||
"affected_email_address",
|
||||
"original_message_id",
|
||||
"original_recipient",
|
||||
"smtp_status_code",
|
||||
"enhanced_status_code",
|
||||
"reason_code",
|
||||
"confidence",
|
||||
"evidence_strength",
|
||||
"occurred_at",
|
||||
"observed_at",
|
||||
"first_seen_at",
|
||||
"last_seen_at",
|
||||
"deduplication_key",
|
||||
"raw_message_ref",
|
||||
"notes",
|
||||
]
|
||||
|
||||
|
||||
def report_filename(now: datetime | None = None) -> str:
|
||||
stamp = (now or datetime.now(UTC)).strftime("%Y%m%d-%H%M%S")
|
||||
return f"email-channel-evidence-report-{stamp}.csv"
|
||||
|
||||
|
||||
def write_evidence_report(
|
||||
rows: list[dict],
|
||||
*,
|
||||
output_dir: str | Path,
|
||||
scan_id: str,
|
||||
mailbox_id: str,
|
||||
generated_at: datetime | None = None,
|
||||
) -> Path:
|
||||
generated = generated_at or datetime.now(UTC)
|
||||
out_dir = Path(output_dir)
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
path = out_dir / report_filename(generated)
|
||||
|
||||
with path.open("w", newline="", encoding="utf-8") as fh:
|
||||
writer = csv.DictWriter(fh, fieldnames=REPORT_COLUMNS)
|
||||
writer.writeheader()
|
||||
for row in rows:
|
||||
writer.writerow(_report_row(row, scan_id=scan_id, mailbox_id=mailbox_id, generated_at=generated))
|
||||
return path
|
||||
|
||||
|
||||
def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datetime) -> dict:
|
||||
metadata = _json(row.get("metadata_json"))
|
||||
notes = _json(row.get("notes_json"))
|
||||
return {
|
||||
"report_generated_at": generated_at.isoformat(),
|
||||
"scan_id": scan_id,
|
||||
"mailbox_id": mailbox_id,
|
||||
"mailbox_message_id": row.get("mailbox_message_id", ""),
|
||||
"mailbox_received_at": row.get("occurred_at") or "",
|
||||
"source_from": metadata.get("source_from", ""),
|
||||
"source_to": metadata.get("source_to", ""),
|
||||
"source_subject": metadata.get("source_subject", ""),
|
||||
"message_id_header": metadata.get("message_id_header", ""),
|
||||
"detected_message_class": metadata.get("message_class", ""),
|
||||
"normalized_event_type": row.get("event_type", ""),
|
||||
"assessment_category": row.get("assessment_category", ""),
|
||||
"assessment_subclass": row.get("assessment_subclass", ""),
|
||||
"affected_email_address": row.get("affected_email_address") or "",
|
||||
"original_message_id": row.get("original_message_id") or "",
|
||||
"original_recipient": metadata.get("original_recipient", ""),
|
||||
"smtp_status_code": metadata.get("smtp_status_code") or "",
|
||||
"enhanced_status_code": metadata.get("enhanced_status_code") or "",
|
||||
"reason_code": metadata.get("reason_code") or "",
|
||||
"confidence": row.get("confidence", ""),
|
||||
"evidence_strength": row.get("evidence_strength", ""),
|
||||
"occurred_at": row.get("occurred_at") or "",
|
||||
"observed_at": row.get("observed_at") or "",
|
||||
"first_seen_at": metadata.get("first_seen_at", ""),
|
||||
"last_seen_at": metadata.get("last_seen_at", ""),
|
||||
"deduplication_key": row.get("deduplication_key", ""),
|
||||
"raw_message_ref": row.get("raw_message_ref") or "",
|
||||
"notes": "; ".join(str(item) for item in notes),
|
||||
}
|
||||
|
||||
|
||||
def _json(value: str | None) -> dict | list:
|
||||
if not value:
|
||||
return {}
|
||||
try:
|
||||
return json.loads(value)
|
||||
except json.JSONDecodeError:
|
||||
return {}
|
||||
107
src/email_connect/scanner.py
Normal file
107
src/email_connect/scanner.py
Normal file
@@ -0,0 +1,107 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from uuid import uuid4
|
||||
|
||||
from .config import AppConfig
|
||||
from .models import MailboxScan
|
||||
from .parser import parse_message_file
|
||||
from .reporting import write_evidence_report
|
||||
from .storage import StateStore
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ScanResult:
|
||||
scan: MailboxScan
|
||||
report_path: Path | None
|
||||
|
||||
|
||||
def scan_mailbox(
|
||||
config: AppConfig,
|
||||
*,
|
||||
output_dir: str | None = None,
|
||||
full_rescan: bool = False,
|
||||
report_only_new: bool = False,
|
||||
dry_run: bool = False,
|
||||
fixture_dir: str | None = None,
|
||||
) -> ScanResult:
|
||||
started_at = datetime.now(UTC)
|
||||
scan_id = str(uuid4())
|
||||
source_dir = fixture_dir or config.source.fixture_dir
|
||||
if not source_dir:
|
||||
raise ValueError("No source.fixture_dir configured yet. IMAP connector is planned in EMAIL-WP-0002-T02.")
|
||||
|
||||
files = sorted(Path(source_dir).glob("*.eml"))
|
||||
if config.scan.max_messages_per_run:
|
||||
files = files[: config.scan.max_messages_per_run]
|
||||
|
||||
store = StateStore(config.storage.path)
|
||||
messages_seen = 0
|
||||
messages_new = 0
|
||||
messages_parsed = 0
|
||||
evidence_created = 0
|
||||
try:
|
||||
for path in files:
|
||||
messages_seen += 1
|
||||
inbound, parsed, candidate = parse_message_file(path, mailbox_id=config.mailbox.id)
|
||||
if dry_run:
|
||||
messages_parsed += 1
|
||||
continue
|
||||
is_new = store.upsert_message(inbound)
|
||||
messages_new += int(is_new)
|
||||
store.insert_parsed(parsed)
|
||||
messages_parsed += 1
|
||||
if candidate is not None:
|
||||
candidate = _enrich_candidate(candidate, inbound, parsed)
|
||||
evidence_created += int(store.insert_evidence(candidate))
|
||||
|
||||
report_path = None
|
||||
if not dry_run:
|
||||
report_rows = store.evidence_rows()
|
||||
report_path = write_evidence_report(
|
||||
report_rows,
|
||||
output_dir=output_dir or config.reports.output_dir,
|
||||
scan_id=scan_id,
|
||||
mailbox_id=config.mailbox.id,
|
||||
)
|
||||
finished_at = datetime.now(UTC)
|
||||
scan = MailboxScan(
|
||||
scan_id=scan_id,
|
||||
mailbox_id=config.mailbox.id,
|
||||
started_at=started_at,
|
||||
finished_at=finished_at,
|
||||
scan_mode="full_rescan" if full_rescan else config.scan.mode,
|
||||
folder=config.mailbox.folder,
|
||||
status="completed",
|
||||
messages_seen=messages_seen,
|
||||
messages_new=messages_new,
|
||||
messages_parsed=messages_parsed,
|
||||
evidence_events_created=evidence_created,
|
||||
report_path=str(report_path) if report_path else None,
|
||||
)
|
||||
if not dry_run:
|
||||
store.insert_scan(scan)
|
||||
return ScanResult(scan=scan, report_path=report_path)
|
||||
finally:
|
||||
store.close()
|
||||
|
||||
|
||||
def _enrich_candidate(candidate, inbound, parsed):
|
||||
metadata = {
|
||||
**candidate.metadata,
|
||||
"source_from": inbound.from_address,
|
||||
"source_to": ", ".join(inbound.to_addresses),
|
||||
"source_subject": inbound.subject,
|
||||
"message_id_header": inbound.message_id_header,
|
||||
"original_recipient": parsed.original_recipient,
|
||||
"first_seen_at": inbound.first_seen_at.isoformat(),
|
||||
"last_seen_at": inbound.last_seen_at.isoformat(),
|
||||
}
|
||||
return type(candidate)(
|
||||
**{
|
||||
**candidate.__dict__,
|
||||
"metadata": metadata,
|
||||
}
|
||||
)
|
||||
212
src/email_connect/storage.py
Normal file
212
src/email_connect/storage.py
Normal file
@@ -0,0 +1,212 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from .models import EmailEvidenceCandidate, InboundMailboxMessage, MailboxScan, ParsedMailboxMessage
|
||||
|
||||
|
||||
class StateStore:
|
||||
def __init__(self, path: str | Path) -> None:
|
||||
self.path = Path(path)
|
||||
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||
self.conn = sqlite3.connect(self.path)
|
||||
self.conn.row_factory = sqlite3.Row
|
||||
self.init_schema()
|
||||
|
||||
def close(self) -> None:
|
||||
self.conn.close()
|
||||
|
||||
def init_schema(self) -> None:
|
||||
self.conn.executescript(
|
||||
"""
|
||||
create table if not exists mailbox_scans (
|
||||
scan_id text primary key,
|
||||
mailbox_id text not null,
|
||||
started_at text not null,
|
||||
finished_at text,
|
||||
scan_mode text not null,
|
||||
folder text not null,
|
||||
status text not null,
|
||||
messages_seen integer not null,
|
||||
messages_new integer not null,
|
||||
messages_parsed integer not null,
|
||||
evidence_events_created integer not null,
|
||||
report_path text,
|
||||
since text
|
||||
);
|
||||
|
||||
create table if not exists mailbox_messages (
|
||||
mailbox_message_id text primary key,
|
||||
mailbox_id text not null,
|
||||
imap_uid text,
|
||||
message_id_header text,
|
||||
received_at text,
|
||||
from_address text,
|
||||
to_addresses_json text not null,
|
||||
subject text,
|
||||
raw_headers_ref text,
|
||||
raw_message_ref text,
|
||||
first_seen_at text not null,
|
||||
last_seen_at text not null,
|
||||
deduplication_key text not null unique
|
||||
);
|
||||
|
||||
create table if not exists parsed_messages (
|
||||
parsed_message_id text primary key,
|
||||
mailbox_message_id text not null,
|
||||
parser_version text not null,
|
||||
message_class text not null,
|
||||
affected_email_address text,
|
||||
original_message_id text,
|
||||
original_recipient text,
|
||||
smtp_status_code text,
|
||||
enhanced_status_code text,
|
||||
reason_code text,
|
||||
confidence text not null,
|
||||
parsed_at text not null,
|
||||
notes_json text not null
|
||||
);
|
||||
|
||||
create table if not exists evidence_candidates (
|
||||
evidence_candidate_id text primary key,
|
||||
mailbox_message_id text not null,
|
||||
parsed_message_id text not null,
|
||||
event_type text not null,
|
||||
assessment_category text not null,
|
||||
assessment_subclass text not null,
|
||||
affected_email_address text,
|
||||
original_message_id text,
|
||||
confidence text not null,
|
||||
evidence_strength text not null,
|
||||
occurred_at text,
|
||||
observed_at text not null,
|
||||
deduplication_key text not null unique,
|
||||
raw_message_ref text,
|
||||
notes_json text not null,
|
||||
metadata_json text not null
|
||||
);
|
||||
"""
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def upsert_message(self, message: InboundMailboxMessage) -> bool:
|
||||
existing = self.conn.execute(
|
||||
"select mailbox_message_id from mailbox_messages where deduplication_key = ?",
|
||||
(message.deduplication_key,),
|
||||
).fetchone()
|
||||
if existing:
|
||||
self.conn.execute(
|
||||
"update mailbox_messages set last_seen_at = ? where deduplication_key = ?",
|
||||
(_dt(message.last_seen_at), message.deduplication_key),
|
||||
)
|
||||
self.conn.commit()
|
||||
return False
|
||||
self.conn.execute(
|
||||
"""
|
||||
insert into mailbox_messages values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
message.mailbox_message_id,
|
||||
message.mailbox_id,
|
||||
message.imap_uid,
|
||||
message.message_id_header,
|
||||
_dt(message.received_at),
|
||||
message.from_address,
|
||||
json.dumps(message.to_addresses),
|
||||
message.subject,
|
||||
message.raw_headers_ref,
|
||||
message.raw_message_ref,
|
||||
_dt(message.first_seen_at),
|
||||
_dt(message.last_seen_at),
|
||||
message.deduplication_key,
|
||||
),
|
||||
)
|
||||
self.conn.commit()
|
||||
return True
|
||||
|
||||
def insert_parsed(self, parsed: ParsedMailboxMessage) -> None:
|
||||
self.conn.execute(
|
||||
"""
|
||||
insert or replace into parsed_messages values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
parsed.parsed_message_id,
|
||||
parsed.mailbox_message_id,
|
||||
parsed.parser_version,
|
||||
parsed.message_class.value,
|
||||
parsed.affected_email_address,
|
||||
parsed.original_message_id,
|
||||
parsed.original_recipient,
|
||||
parsed.smtp_status_code,
|
||||
parsed.enhanced_status_code,
|
||||
parsed.reason_code,
|
||||
parsed.confidence.value,
|
||||
_dt(parsed.parsed_at),
|
||||
json.dumps(parsed.notes),
|
||||
),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def insert_evidence(self, candidate: EmailEvidenceCandidate) -> bool:
|
||||
try:
|
||||
self.conn.execute(
|
||||
"""
|
||||
insert into evidence_candidates values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
candidate.evidence_candidate_id,
|
||||
candidate.mailbox_message_id,
|
||||
candidate.parsed_message_id,
|
||||
candidate.event_type,
|
||||
candidate.assessment_category.value,
|
||||
candidate.assessment_subclass,
|
||||
candidate.affected_email_address,
|
||||
candidate.original_message_id,
|
||||
candidate.confidence.value,
|
||||
candidate.evidence_strength.value,
|
||||
_dt(candidate.occurred_at),
|
||||
_dt(candidate.observed_at),
|
||||
candidate.deduplication_key,
|
||||
candidate.raw_message_ref,
|
||||
json.dumps(candidate.notes),
|
||||
json.dumps(candidate.metadata),
|
||||
),
|
||||
)
|
||||
except sqlite3.IntegrityError:
|
||||
return False
|
||||
self.conn.commit()
|
||||
return True
|
||||
|
||||
def insert_scan(self, scan: MailboxScan) -> None:
|
||||
self.conn.execute(
|
||||
"""
|
||||
insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
scan.scan_id,
|
||||
scan.mailbox_id,
|
||||
_dt(scan.started_at),
|
||||
_dt(scan.finished_at),
|
||||
scan.scan_mode,
|
||||
scan.folder,
|
||||
scan.status,
|
||||
scan.messages_seen,
|
||||
scan.messages_new,
|
||||
scan.messages_parsed,
|
||||
scan.evidence_events_created,
|
||||
scan.report_path,
|
||||
_dt(scan.since),
|
||||
),
|
||||
)
|
||||
self.conn.commit()
|
||||
|
||||
def evidence_rows(self) -> list[dict]:
|
||||
rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall()
|
||||
return [dict(row) for row in rows]
|
||||
|
||||
|
||||
def _dt(value: datetime | None) -> str | None:
|
||||
return value.isoformat() if value else None
|
||||
12
tests/fixtures/mailbox/hard_bounce.eml
vendored
Normal file
12
tests/fixtures/mailbox/hard_bounce.eml
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
From: Mail Delivery Subsystem <mailer-daemon@example.net>
|
||||
To: sender@example.com
|
||||
Subject: Delivery Status Notification (Failure)
|
||||
Date: Tue, 02 Jun 2026 10:00:00 +0000
|
||||
Message-ID: <hard-bounce@example.net>
|
||||
Content-Type: text/plain; charset=utf-8
|
||||
|
||||
Delivery failure.
|
||||
Final-Recipient: rfc822; missing@example.com
|
||||
Action: failed
|
||||
Status: 5.1.1
|
||||
Diagnostic-Code: smtp; 550 5.1.1 User unknown
|
||||
8
tests/fixtures/mailbox/human_reply.eml
vendored
Normal file
8
tests/fixtures/mailbox/human_reply.eml
vendored
Normal file
@@ -0,0 +1,8 @@
|
||||
From: Recipient <recipient@example.com>
|
||||
To: sender@example.com
|
||||
Subject: Re: Your notification
|
||||
Date: Tue, 02 Jun 2026 10:03:00 +0000
|
||||
Message-ID: <reply@example.com>
|
||||
Content-Type: text/plain; charset=utf-8
|
||||
|
||||
Thanks, I received this and will review it today.
|
||||
9
tests/fixtures/mailbox/out_of_office.eml
vendored
Normal file
9
tests/fixtures/mailbox/out_of_office.eml
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
From: Recipient <recipient@example.com>
|
||||
To: sender@example.com
|
||||
Subject: Auto-reply: Out of office
|
||||
Date: Tue, 02 Jun 2026 10:02:00 +0000
|
||||
Message-ID: <ooo@example.com>
|
||||
Auto-Submitted: auto-replied
|
||||
Content-Type: text/plain; charset=utf-8
|
||||
|
||||
I am out of office until next week.
|
||||
12
tests/fixtures/mailbox/soft_bounce.eml
vendored
Normal file
12
tests/fixtures/mailbox/soft_bounce.eml
vendored
Normal file
@@ -0,0 +1,12 @@
|
||||
From: Mail Delivery Subsystem <mailer-daemon@example.net>
|
||||
To: sender@example.com
|
||||
Subject: Delivery temporarily delayed
|
||||
Date: Tue, 02 Jun 2026 10:01:00 +0000
|
||||
Message-ID: <soft-bounce@example.net>
|
||||
Content-Type: text/plain; charset=utf-8
|
||||
|
||||
Delivery Status Notification.
|
||||
Final-Recipient: rfc822; full@example.com
|
||||
Action: delayed
|
||||
Status: 4.2.2
|
||||
Diagnostic-Code: smtp; 452 4.2.2 Mailbox full
|
||||
43
tests/test_parser.py
Normal file
43
tests/test_parser.py
Normal file
@@ -0,0 +1,43 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from email_connect.models import MessageClass
|
||||
from email_connect.parser import parse_message_file
|
||||
|
||||
|
||||
FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
|
||||
|
||||
|
||||
class ParserTests(unittest.TestCase):
|
||||
def test_hard_bounce_maps_to_permanent_rejection(self) -> None:
|
||||
_inbound, parsed, candidate = parse_message_file(FIXTURES / "hard_bounce.eml", mailbox_id="test")
|
||||
self.assertEqual(parsed.message_class, MessageClass.HARD_BOUNCE)
|
||||
self.assertIsNotNone(candidate)
|
||||
self.assertEqual(candidate.event_type, "notification.endpoint.rejected_permanent")
|
||||
self.assertEqual(candidate.assessment_subclass, "fail.hard_bounce")
|
||||
|
||||
def test_soft_bounce_maps_to_temporary_rejection(self) -> None:
|
||||
_inbound, parsed, candidate = parse_message_file(FIXTURES / "soft_bounce.eml", mailbox_id="test")
|
||||
self.assertEqual(parsed.message_class, MessageClass.SOFT_BOUNCE)
|
||||
self.assertIsNotNone(candidate)
|
||||
self.assertEqual(candidate.event_type, "notification.endpoint.rejected_temporary")
|
||||
|
||||
def test_out_of_office_stays_undef(self) -> None:
|
||||
_inbound, parsed, candidate = parse_message_file(FIXTURES / "out_of_office.eml", mailbox_id="test")
|
||||
self.assertEqual(parsed.message_class, MessageClass.OUT_OF_OFFICE)
|
||||
self.assertIsNotNone(candidate)
|
||||
self.assertEqual(candidate.assessment_category.value, "undef")
|
||||
self.assertEqual(candidate.event_type, "interaction.out_of_office_received")
|
||||
|
||||
def test_human_reply_is_email_channel_success_only(self) -> None:
|
||||
_inbound, parsed, candidate = parse_message_file(FIXTURES / "human_reply.eml", mailbox_id="test")
|
||||
self.assertEqual(parsed.message_class, MessageClass.HUMAN_REPLY)
|
||||
self.assertIsNotNone(candidate)
|
||||
self.assertEqual(candidate.event_type, "interaction.reply_received")
|
||||
self.assertEqual(candidate.assessment_subclass, "success.reply_received")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
37
tests/test_scanner.py
Normal file
37
tests/test_scanner.py
Normal file
@@ -0,0 +1,37 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
from email_connect.config import AppConfig, MailboxConfig, ReportsConfig, ScanConfig, SourceConfig, StorageConfig
|
||||
from email_connect.scanner import scan_mailbox
|
||||
|
||||
|
||||
FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
|
||||
|
||||
|
||||
class ScannerTests(unittest.TestCase):
|
||||
def test_scan_fixture_directory_writes_report_and_deduplicates(self) -> None:
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
root = Path(tmp)
|
||||
config = AppConfig(
|
||||
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
|
||||
scan=ScanConfig(),
|
||||
storage=StorageConfig(path=str(root / "state.sqlite")),
|
||||
reports=ReportsConfig(output_dir=str(root / "reports")),
|
||||
source=SourceConfig(fixture_dir=str(FIXTURES)),
|
||||
)
|
||||
first = scan_mailbox(config)
|
||||
second = scan_mailbox(config)
|
||||
|
||||
self.assertEqual(first.scan.messages_seen, 4)
|
||||
self.assertEqual(first.scan.messages_new, 4)
|
||||
self.assertGreaterEqual(first.scan.evidence_events_created, 4)
|
||||
self.assertEqual(second.scan.messages_new, 0)
|
||||
self.assertEqual(second.scan.evidence_events_created, 0)
|
||||
self.assertTrue(first.report_path and first.report_path.exists())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Repository Onboarding and Implementation Foundation"
|
||||
domain: custodian
|
||||
repo: email-connect
|
||||
status: active
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-02"
|
||||
@@ -55,7 +55,7 @@ Done when:
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0001-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "fdfd8b96-7326-414f-8126-79bb3a21b950"
|
||||
```
|
||||
@@ -76,7 +76,7 @@ The architecture note should cover:
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0001-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "ef1eb769-dfa0-4b46-8633-274d90962423"
|
||||
```
|
||||
@@ -105,7 +105,7 @@ click telemetry as proof of human awareness or result satisfaction.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0001-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "4b94e544-5aad-4c38-8fe3-eed17af79971"
|
||||
```
|
||||
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "MVP Mailbox Evidence Scanner"
|
||||
domain: custodian
|
||||
repo: email-connect
|
||||
status: ready
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-02"
|
||||
@@ -688,7 +688,7 @@ Endpoint quality is diagnostic and must not be treated as coordination success.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3a17215d-62a9-48ef-877f-a6fbc7e95a22"
|
||||
```
|
||||
@@ -716,7 +716,7 @@ Config file is loaded and validated.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T02
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "25a4da12-1bcd-4c6d-a0eb-a2f525b9c4b9"
|
||||
```
|
||||
@@ -744,7 +744,7 @@ CLI can connect to mailbox and list/fetch messages without modifying mailbox.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T03
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "16b95a6b-1375-4c91-8b78-0b75d51e0aeb"
|
||||
```
|
||||
@@ -773,7 +773,7 @@ Full rescan can revisit all messages while preserving deduplication.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "5a50cd85-b0ab-4017-aba0-b2087068abb4"
|
||||
```
|
||||
@@ -802,7 +802,7 @@ Scanner extracts basic metadata and text from representative bounce and reply me
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T05
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "8ea826d1-0add-4573-9bb4-2b73adefba55"
|
||||
```
|
||||
@@ -831,7 +831,7 @@ Representative hard and soft bounce samples are classified correctly.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T06
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "4d94a332-173b-4787-8fb2-27aa63db6a8d"
|
||||
```
|
||||
@@ -880,7 +880,7 @@ Representative complaint and unsubscribe examples are classified.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T08
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "6d62dea0-f416-4c0b-80a0-7c16422b8e5f"
|
||||
```
|
||||
@@ -932,7 +932,7 @@ Complaint/unsubscribe updates suppression state.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T10
|
||||
status: todo
|
||||
status: progress
|
||||
priority: medium
|
||||
state_hub_task_id: "5ab35176-d6c2-4c73-b7b3-bde4c097e3ee"
|
||||
```
|
||||
@@ -959,7 +959,7 @@ Report can be opened in spreadsheet tools.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T11
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "514fa099-781b-4590-aae4-c28970413b3f"
|
||||
```
|
||||
@@ -989,7 +989,7 @@ Automated tests verify expected classification and normalized event output.
|
||||
|
||||
```task
|
||||
id: EMAIL-WP-0002-T12
|
||||
status: todo
|
||||
status: progress
|
||||
priority: medium
|
||||
state_hub_task_id: "a5f7067e-87be-4438-ba35-b12d06a8181e"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user