feat: expand mailbox evidence scanner

This commit is contained in:
2026-06-02 02:07:50 +02:00
parent 8532583182
commit 226c045397
16 changed files with 670 additions and 33 deletions

View File

@@ -19,16 +19,25 @@ The example config uses `tests/fixtures/mailbox` as a mailbox source. Runtime
state is written to `.email-connect/state.sqlite`; generated CSV reports are state is written to `.email-connect/state.sqlite`; generated CSV reports are
written to `reports/`. written to `reports/`.
For a live mailbox, set `mailbox.protocol: imap`, configure host, port, folder,
and credential environment variable names, then export the credentials before
running `scan-mailbox`. IMAP scans select the configured folder read-only and
fetch message bodies with `BODY.PEEK[]`; mailbox write-back actions such as
marking messages seen are intentionally unsupported in this MVP.
## Current Scope ## Current Scope
- Coordination-engine spec review and references. - Coordination-engine spec review and references.
- Initial adapter descriptor, capability profile, evidence ceiling, and - Initial adapter descriptor, capability profile, evidence ceiling, and
limitations. limitations.
- Fixture and read-only IMAP mailbox sources.
- Conservative mailbox message parser and evidence mapper. - Conservative mailbox message parser and evidence mapper.
- SQLite state store with message and evidence deduplication. - SQLite state store with scan cursor, message/evidence deduplication, and
- CSV report generation. endpoint quality hints.
- Golden fixture tests for hard bounce, soft bounce, out-of-office, and human - CSV report generation, including `--report-only-new`.
- Golden fixture tests for hard bounce, soft bounce, delayed delivery, final
failure, complaint, unsubscribe, unknown return, out-of-office, and human
reply signals. reply signals.
IMAP access, provider webhooks, outbound sending, suppression workflows, and a Provider webhooks, outbound sending, suppression workflows, OAuth mailbox login,
UI are planned work, not complete in this first slice. and a UI remain outside this first mailbox-scanner slice.

View File

@@ -46,3 +46,19 @@ coordination runtime decides whether those facts satisfy a coordination case.
- Human reply does not prove legal acceptance. - Human reply does not prove legal acceptance.
- Unknown return messages remain visible. - Unknown return messages remain visible.
- Scanner and proxy interactions must stay below identity-bound interaction. - Scanner and proxy interactions must stay below identity-bound interaction.
## Endpoint Quality Hints
Endpoint quality rows are diagnostic state, not verdicts:
| Evidence | Quality hint |
| --- | --- |
| Hard bounce or final failure | `reachability = unreachable`, `last_failure_at` |
| Soft bounce or delayed delivery | `reachability = degraded`, `last_failure_at` |
| Complaint | `suppression_state = suppressed` |
| Unsubscribe | `suppression_state = opted_out` |
| Human reply | `last_success_at` |
| Out-of-office | `reachability = uncertain`, `last_auto_reply_at` |
These hints can guide future suppression and review workflows, but they do not
prove human awareness, authority, payload access, or coordination success.

View File

@@ -15,11 +15,13 @@ email-connect scan-mailbox --config config/mailbox.example.yml --out reports/
``` ```
It scans an inbound return mailbox source, classifies messages, stores scan It scans an inbound return mailbox source, classifies messages, stores scan
state in SQLite, and writes timestamped CSV evidence reports. state in SQLite, updates endpoint-quality hints, and writes timestamped CSV
evidence reports.
The initial source implementation supports fixture directories. The IMAP The source layer supports deterministic fixture directories and a read-only IMAP
connector remains the next mailbox boundary to complete under connector. IMAP scans select the configured folder with `readonly=True`, fetch
`EMAIL-WP-0002-T02`. messages using `BODY.PEEK[]`, and reject `mark_seen` because mailbox write-back
actions are out of scope for this MVP.
## Package Layout ## Package Layout
@@ -29,6 +31,7 @@ src/email_connect/
cli.py # command line entry points cli.py # command line entry points
config.py # config loading config.py # config loading
evidence.py # native class to normalized evidence mapping evidence.py # native class to normalized evidence mapping
mailbox.py # fixture and IMAP mailbox sources
models.py # mailbox, parse, evidence, endpoint quality dataclasses models.py # mailbox, parse, evidence, endpoint quality dataclasses
parser.py # MIME/header parsing and conservative classification parser.py # MIME/header parsing and conservative classification
reporting.py # CSV report generation reporting.py # CSV report generation
@@ -44,12 +47,19 @@ SQLite is the MVP store. The initial schema includes:
- `mailbox_messages` - `mailbox_messages`
- `parsed_messages` - `parsed_messages`
- `evidence_candidates` - `evidence_candidates`
- `scan_cursors`
- `endpoint_quality`
Message deduplication is keyed by mailbox ID, IMAP UID when present, message ID, Message deduplication is keyed by mailbox ID, IMAP UID when present, message ID,
received timestamp, sender, subject hash, and body hash. Evidence received timestamp, sender, subject hash, and body hash. Evidence
deduplication follows the workplan fields: message, parser version, normalized deduplication follows the workplan fields: message, parser version, normalized
event, affected recipient, original message, SMTP/enhanced status, and reason. event, affected recipient, original message, SMTP/enhanced status, and reason.
Incremental scans use `scan_cursors` by mailbox and folder. Full rescans ignore
the cursor while preserving message and evidence deduplication. Endpoint-quality
rows are diagnostic hints derived from explicit evidence events; they are not
coordination outcomes.
## Evidence Mapping ## Evidence Mapping
Parser output is represented as `ParsedMailboxMessage`. The mapper converts it Parser output is represented as `ParsedMailboxMessage`. The mapper converts it
@@ -60,6 +70,9 @@ Examples:
- `hard_bounce` -> `notification.endpoint.rejected_permanent` - `hard_bounce` -> `notification.endpoint.rejected_permanent`
- `soft_bounce` -> `notification.endpoint.rejected_temporary` - `soft_bounce` -> `notification.endpoint.rejected_temporary`
- `delayed_delivery_notice` -> `notification.endpoint.deferred`
- `complaint_or_abuse` -> `notification.channel.complaint_received`
- `unsubscribe_or_opt_out` -> `notification.channel.unsubscribe_received`
- `out_of_office` -> `interaction.out_of_office_received` - `out_of_office` -> `interaction.out_of_office_received`
- `human_reply` -> `interaction.reply_received` - `human_reply` -> `interaction.reply_received`
@@ -95,4 +108,5 @@ normalization layer.
PYTHONPATH=src python3 -m unittest discover -s tests PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/ PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --report-only-new --out reports/
``` ```

View File

@@ -38,6 +38,7 @@ def main(argv: list[str] | None = None) -> int:
report_only_new=args.report_only_new, report_only_new=args.report_only_new,
dry_run=args.dry_run, dry_run=args.dry_run,
fixture_dir=args.fixture_dir, fixture_dir=args.fixture_dir,
since=args.since,
) )
print(f"scan_id={result.scan.scan_id}") print(f"scan_id={result.scan.scan_id}")
print(f"messages_seen={result.scan.messages_seen}") print(f"messages_seen={result.scan.messages_seen}")

View File

@@ -8,6 +8,7 @@ from .models import (
AssessmentCategory, AssessmentCategory,
Confidence, Confidence,
EmailEvidenceCandidate, EmailEvidenceCandidate,
EndpointQualityUpdate,
EvidenceStrength, EvidenceStrength,
MessageClass, MessageClass,
ParsedMailboxMessage, ParsedMailboxMessage,
@@ -149,3 +150,68 @@ def candidate_from_parsed(
"reason_code": parsed.reason_code, "reason_code": parsed.reason_code,
}, },
) )
def endpoint_quality_from_candidate(candidate: EmailEvidenceCandidate) -> EndpointQualityUpdate | None:
address = candidate.affected_email_address
if not address:
return None
message_class = candidate.metadata.get("message_class")
signal = str(message_class or candidate.event_type)
reason_code = candidate.metadata.get("reason_code")
observed_at = candidate.observed_at
if message_class in {MessageClass.HARD_BOUNCE.value, MessageClass.FINAL_DELIVERY_FAILURE.value}:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
reachability="unreachable",
last_failure_at=observed_at,
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
if message_class in {MessageClass.SOFT_BOUNCE.value, MessageClass.DELAYED_DELIVERY_NOTICE.value}:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
reachability="degraded",
last_failure_at=observed_at,
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
if message_class == MessageClass.COMPLAINT_OR_ABUSE.value:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
suppression_state="suppressed",
last_failure_at=observed_at,
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
if message_class == MessageClass.UNSUBSCRIBE_OR_OPT_OUT.value:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
suppression_state="opted_out",
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
if message_class == MessageClass.HUMAN_REPLY.value:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
last_success_at=observed_at,
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
if message_class == MessageClass.OUT_OF_OFFICE.value:
return EndpointQualityUpdate(
affected_email_address=address.lower(),
reachability="uncertain",
last_auto_reply_at=observed_at,
reason_code=reason_code,
confidence=candidate.confidence,
quality_signals=[signal, candidate.event_type],
)
return None

View File

@@ -0,0 +1,196 @@
from __future__ import annotations
import imaplib
import os
from dataclasses import dataclass
from datetime import datetime
from pathlib import Path
from typing import Iterable, Protocol
from .config import AppConfig
@dataclass(frozen=True)
class MailboxSourceMessage:
source_uid: str
raw_bytes: bytes
raw_message_ref: str
imap_uid: str | None = None
class MailboxMessageSource(Protocol):
def iter_messages(
self,
*,
max_messages: int,
since_uid: str | None,
full_rescan: bool,
include_seen: bool,
since: datetime | None,
) -> Iterable[MailboxSourceMessage]:
...
class FixtureMailboxSource:
def __init__(self, fixture_dir: str | Path) -> None:
self.fixture_dir = Path(fixture_dir)
def iter_messages(
self,
*,
max_messages: int,
since_uid: str | None,
full_rescan: bool,
include_seen: bool,
since: datetime | None,
) -> Iterable[MailboxSourceMessage]:
del include_seen, since
paths = sorted(self.fixture_dir.glob("*.eml"))
emitted = 0
for path in paths:
source_uid = path.name
if not full_rescan and since_uid and source_uid <= since_uid:
continue
yield MailboxSourceMessage(
source_uid=source_uid,
raw_bytes=path.read_bytes(),
raw_message_ref=str(path),
imap_uid=None,
)
emitted += 1
if max_messages and emitted >= max_messages:
break
class ImapMailboxSource:
def __init__(self, config: AppConfig) -> None:
self.config = config
def iter_messages(
self,
*,
max_messages: int,
since_uid: str | None,
full_rescan: bool,
include_seen: bool,
since: datetime | None,
) -> Iterable[MailboxSourceMessage]:
mailbox = self.config.mailbox
if self.config.scan.mark_seen:
raise ValueError("IMAP mark_seen is intentionally unsupported; scans are read-only.")
if not mailbox.host:
raise ValueError("mailbox.host is required for IMAP scans.")
if not mailbox.username_env or not mailbox.password_env:
raise ValueError("mailbox.username_env and mailbox.password_env are required for IMAP scans.")
username = os.environ.get(mailbox.username_env)
password = os.environ.get(mailbox.password_env)
if not username or not password:
raise ValueError(f"IMAP credentials not found in {mailbox.username_env}/{mailbox.password_env}.")
conn: imaplib.IMAP4
if mailbox.tls:
conn = imaplib.IMAP4_SSL(mailbox.host, mailbox.port)
else:
conn = imaplib.IMAP4(mailbox.host, mailbox.port)
selected = False
try:
_expect_ok(conn.login(username, password), "login")
_expect_ok(conn.select(mailbox.folder, readonly=True), f"select {mailbox.folder}")
selected = True
criteria = _search_criteria(include_seen=include_seen, since=since)
_status, search_data = _expect_ok(conn.uid("search", None, *criteria), "uid search")
uids = _decode_uids(search_data)
if not full_rescan and since_uid:
uids = [uid for uid in uids if _uid_after(uid, since_uid)]
uids = sorted(uids, key=_uid_sort_key)
if max_messages:
uids = uids[:max_messages]
for uid in uids:
_fetch_status, fetch_data = _expect_ok(conn.uid("fetch", uid, "(BODY.PEEK[])"), f"uid fetch {uid}")
raw_bytes = _raw_message_from_fetch(fetch_data)
if raw_bytes is None:
continue
yield MailboxSourceMessage(
source_uid=uid,
raw_bytes=raw_bytes,
raw_message_ref=f"imap://{mailbox.host}/{mailbox.folder};UID={uid}",
imap_uid=uid,
)
finally:
if selected:
try:
conn.close()
except imaplib.IMAP4.error:
pass
try:
conn.logout()
except imaplib.IMAP4.error:
pass
def source_for_config(config: AppConfig, *, fixture_dir_override: str | None = None) -> MailboxMessageSource:
if fixture_dir_override:
return FixtureMailboxSource(fixture_dir_override)
if config.mailbox.protocol == "fixture":
fixture_dir = config.source.fixture_dir
if not fixture_dir:
raise ValueError("source.fixture_dir is required for fixture scans.")
return FixtureMailboxSource(fixture_dir)
if config.mailbox.protocol == "imap":
return ImapMailboxSource(config)
raise ValueError(f"Unsupported mailbox protocol: {config.mailbox.protocol}")
def _search_criteria(*, include_seen: bool, since: datetime | None) -> list[str]:
criteria = ["ALL" if include_seen else "UNSEEN"]
if since is not None:
criteria.extend(["SINCE", _imap_date(since)])
return criteria
def _imap_date(value: datetime) -> str:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
return f"{value.day:02d}-{months[value.month - 1]}-{value.year:04d}"
def _expect_ok(result: tuple[str, list], operation: str) -> tuple[str, list]:
status, data = result
if status != "OK":
raise RuntimeError(f"IMAP {operation} failed with status {status}: {data!r}")
return status, data
def _decode_uids(search_data: list) -> list[str]:
if not search_data:
return []
raw = search_data[0] or b""
if isinstance(raw, str):
raw_text = raw
else:
raw_text = raw.decode("ascii", errors="ignore")
return [part for part in raw_text.split() if part]
def _uid_after(uid: str, since_uid: str) -> bool:
try:
return int(uid) > int(since_uid)
except ValueError:
return uid > since_uid
def _uid_sort_key(uid: str) -> tuple[int, int | str]:
try:
return (0, int(uid))
except ValueError:
return (1, uid)
def _raw_message_from_fetch(fetch_data: list) -> bytes | None:
for item in fetch_data:
if isinstance(item, tuple) and len(item) >= 2 and isinstance(item[1], bytes):
return item[1]
return None

View File

@@ -2,12 +2,13 @@ from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass
from datetime import UTC, datetime from datetime import UTC, datetime
from pathlib import Path
from uuid import uuid4 from uuid import uuid4
from .config import AppConfig from .config import AppConfig
from .evidence import endpoint_quality_from_candidate
from .mailbox import source_for_config
from .models import MailboxScan from .models import MailboxScan
from .parser import parse_message_file from .parser import parse_message_bytes
from .reporting import write_evidence_report from .reporting import write_evidence_report
from .storage import StateStore from .storage import StateStore
@@ -26,26 +27,39 @@ def scan_mailbox(
report_only_new: bool = False, report_only_new: bool = False,
dry_run: bool = False, dry_run: bool = False,
fixture_dir: str | None = None, fixture_dir: str | None = None,
since: str | None = None,
) -> ScanResult: ) -> ScanResult:
started_at = datetime.now(UTC) started_at = datetime.now(UTC)
scan_id = str(uuid4()) scan_id = str(uuid4())
source_dir = fixture_dir or config.source.fixture_dir since_at = _parse_since(since or config.scan.since)
if not source_dir: source = source_for_config(config, fixture_dir_override=fixture_dir)
raise ValueError("No source.fixture_dir configured yet. IMAP connector is planned in EMAIL-WP-0002-T02.")
files = sorted(Path(source_dir).glob("*.eml"))
if config.scan.max_messages_per_run:
files = files[: config.scan.max_messages_per_run]
store = StateStore(config.storage.path) store = StateStore(config.storage.path)
messages_seen = 0 messages_seen = 0
messages_new = 0 messages_new = 0
messages_parsed = 0 messages_parsed = 0
evidence_created = 0 evidence_created = 0
new_evidence_keys: list[str] = []
last_source_uid: str | None = None
try: try:
for path in files: cursor = None if full_rescan else store.get_scan_cursor(config.mailbox.id, config.mailbox.folder)
for message in source.iter_messages(
max_messages=config.scan.max_messages_per_run,
since_uid=cursor,
full_rescan=full_rescan,
include_seen=config.scan.include_seen,
since=since_at,
):
messages_seen += 1 messages_seen += 1
inbound, parsed, candidate = parse_message_file(path, mailbox_id=config.mailbox.id) last_source_uid = message.source_uid
inbound, parsed, candidate = parse_message_bytes(
message.raw_bytes,
mailbox_id=config.mailbox.id,
raw_message_ref=message.raw_message_ref,
imap_uid=message.imap_uid,
)
if since_at and inbound.received_at and inbound.received_at < since_at:
continue
if dry_run: if dry_run:
messages_parsed += 1 messages_parsed += 1
continue continue
@@ -55,11 +69,31 @@ def scan_mailbox(
messages_parsed += 1 messages_parsed += 1
if candidate is not None: if candidate is not None:
candidate = _enrich_candidate(candidate, inbound, parsed) candidate = _enrich_candidate(candidate, inbound, parsed)
evidence_created += int(store.insert_evidence(candidate)) inserted = store.insert_evidence(candidate)
evidence_created += int(inserted)
if inserted:
new_evidence_keys.append(candidate.deduplication_key)
quality_update = endpoint_quality_from_candidate(candidate)
if quality_update is not None:
store.apply_endpoint_quality_update(
config.mailbox.id,
quality_update,
observed_at=candidate.observed_at,
)
if not dry_run and last_source_uid:
store.set_scan_cursor(
config.mailbox.id,
config.mailbox.folder,
last_source_uid,
updated_at=datetime.now(UTC),
)
report_path = None report_path = None
if not dry_run: if not dry_run:
report_rows = store.evidence_rows() report_rows = store.evidence_rows(
deduplication_keys=new_evidence_keys if report_only_new else None,
)
report_path = write_evidence_report( report_path = write_evidence_report(
report_rows, report_rows,
output_dir=output_dir or config.reports.output_dir, output_dir=output_dir or config.reports.output_dir,
@@ -80,6 +114,7 @@ def scan_mailbox(
messages_parsed=messages_parsed, messages_parsed=messages_parsed,
evidence_events_created=evidence_created, evidence_events_created=evidence_created,
report_path=str(report_path) if report_path else None, report_path=str(report_path) if report_path else None,
since=since_at,
) )
if not dry_run: if not dry_run:
store.insert_scan(scan) store.insert_scan(scan)
@@ -105,3 +140,17 @@ def _enrich_candidate(candidate, inbound, parsed):
"metadata": metadata, "metadata": metadata,
} }
) )
def _parse_since(value: str | None) -> datetime | None:
if not value:
return None
normalized = value.strip()
if not normalized:
return None
if len(normalized) == 10:
normalized = normalized + "T00:00:00+00:00"
parsed = datetime.fromisoformat(normalized.replace("Z", "+00:00"))
if parsed.tzinfo is None:
return parsed.replace(tzinfo=UTC)
return parsed.astimezone(UTC)

View File

@@ -5,7 +5,14 @@ import sqlite3
from datetime import UTC, datetime from datetime import UTC, datetime
from pathlib import Path from pathlib import Path
from .models import EmailEvidenceCandidate, InboundMailboxMessage, MailboxScan, ParsedMailboxMessage from .models import (
Confidence,
EmailEvidenceCandidate,
EndpointQualityUpdate,
InboundMailboxMessage,
MailboxScan,
ParsedMailboxMessage,
)
class StateStore: class StateStore:
@@ -88,6 +95,30 @@ class StateStore:
notes_json text not null, notes_json text not null,
metadata_json text not null metadata_json text not null
); );
create table if not exists scan_cursors (
mailbox_id text not null,
folder text not null,
last_uid text not null,
updated_at text not null,
primary key (mailbox_id, folder)
);
create table if not exists endpoint_quality (
endpoint_key text primary key,
mailbox_id text not null,
affected_email_address text not null,
reachability text not null,
suppression_state text not null,
last_success_at text,
last_failure_at text,
last_auto_reply_at text,
reason_code text,
confidence text not null,
quality_signals_json text not null,
updated_at text not null,
unique (mailbox_id, affected_email_address)
);
""" """
) )
self.conn.commit() self.conn.commit()
@@ -203,10 +234,148 @@ class StateStore:
) )
self.conn.commit() self.conn.commit()
def evidence_rows(self) -> list[dict]: def get_scan_cursor(self, mailbox_id: str, folder: str) -> str | None:
row = self.conn.execute(
"select last_uid from scan_cursors where mailbox_id = ? and folder = ?",
(mailbox_id, folder),
).fetchone()
return str(row["last_uid"]) if row else None
def set_scan_cursor(self, mailbox_id: str, folder: str, last_uid: str, *, updated_at: datetime) -> None:
self.conn.execute(
"""
insert into scan_cursors values (?, ?, ?, ?)
on conflict(mailbox_id, folder) do update set
last_uid = excluded.last_uid,
updated_at = excluded.updated_at
""",
(mailbox_id, folder, last_uid, _dt(updated_at)),
)
self.conn.commit()
def apply_endpoint_quality_update(
self,
mailbox_id: str,
update: EndpointQualityUpdate,
*,
observed_at: datetime,
) -> None:
address = update.affected_email_address.lower()
endpoint_key = f"{mailbox_id}:{address}"
existing = self.conn.execute(
"select * from endpoint_quality where endpoint_key = ?",
(endpoint_key,),
).fetchone()
merged = _merge_endpoint_quality(existing, update, observed_at=observed_at)
self.conn.execute(
"""
insert into endpoint_quality values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
on conflict(endpoint_key) do update set
reachability = excluded.reachability,
suppression_state = excluded.suppression_state,
last_success_at = excluded.last_success_at,
last_failure_at = excluded.last_failure_at,
last_auto_reply_at = excluded.last_auto_reply_at,
reason_code = excluded.reason_code,
confidence = excluded.confidence,
quality_signals_json = excluded.quality_signals_json,
updated_at = excluded.updated_at
""",
(
endpoint_key,
mailbox_id,
address,
merged["reachability"],
merged["suppression_state"],
_dt(merged["last_success_at"]),
_dt(merged["last_failure_at"]),
_dt(merged["last_auto_reply_at"]),
merged["reason_code"],
merged["confidence"],
json.dumps(merged["quality_signals"]),
_dt(observed_at),
),
)
self.conn.commit()
def endpoint_quality_rows(self) -> list[dict]:
rows = self.conn.execute(
"select * from endpoint_quality order by affected_email_address"
).fetchall()
return [dict(row) for row in rows]
def evidence_rows(self, *, deduplication_keys: list[str] | None = None) -> list[dict]:
if deduplication_keys is not None:
if not deduplication_keys:
return []
placeholders = ", ".join("?" for _ in deduplication_keys)
rows = self.conn.execute(
f"""
select * from evidence_candidates
where deduplication_key in ({placeholders})
order by observed_at, event_type
""",
deduplication_keys,
).fetchall()
return [dict(row) for row in rows]
rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall() rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall()
return [dict(row) for row in rows] return [dict(row) for row in rows]
def _dt(value: datetime | None) -> str | None: def _dt(value: datetime | None) -> str | None:
return value.isoformat() if value else None return value.isoformat() if value else None
def _parse_dt(value: str | None) -> datetime | None:
return datetime.fromisoformat(value) if value else None
def _merge_endpoint_quality(
existing,
update: EndpointQualityUpdate,
*,
observed_at: datetime,
) -> dict:
existing_signals = json.loads(existing["quality_signals_json"]) if existing else []
update_signals = [signal for signal in update.quality_signals if signal]
signals = list(dict.fromkeys([*existing_signals, *update_signals]))
return {
"reachability": _merge_state(existing, "reachability", update.reachability, default="unknown"),
"suppression_state": _merge_state(
existing,
"suppression_state",
update.suppression_state,
default="unknown",
),
"last_success_at": _latest_dt(_parse_dt(existing["last_success_at"]) if existing else None, update.last_success_at),
"last_failure_at": _latest_dt(_parse_dt(existing["last_failure_at"]) if existing else None, update.last_failure_at),
"last_auto_reply_at": _latest_dt(
_parse_dt(existing["last_auto_reply_at"]) if existing else None,
update.last_auto_reply_at,
),
"reason_code": update.reason_code or (existing["reason_code"] if existing else None),
"confidence": _max_confidence(existing["confidence"] if existing else Confidence.LOW.value, update.confidence.value),
"quality_signals": signals,
"updated_at": observed_at,
}
def _merge_state(existing, field: str, new_value: str, *, default: str) -> str:
if new_value != "unknown":
return new_value
if existing:
return str(existing[field])
return default
def _latest_dt(left: datetime | None, right: datetime | None) -> datetime | None:
if left is None:
return right
if right is None:
return left
return max(left, right)
def _max_confidence(left: str, right: str) -> str:
order = {Confidence.LOW.value: 0, Confidence.MEDIUM.value: 1, Confidence.HIGH.value: 2}
return left if order.get(left, 0) >= order.get(right, 0) else right

10
tests/fixtures/mailbox/complaint.eml vendored Normal file
View File

@@ -0,0 +1,10 @@
From: Feedback Loop <fbl@example.net>
To: abuse@example.com
Subject: Spam complaint notification
Date: Tue, 02 Jun 2026 10:04:00 +0000
Message-ID: <complaint@example.net>
Content-Type: text/plain; charset=utf-8
Feedback loop abuse report.
Final-Recipient: rfc822; complained@example.com
This is a spam complaint notification for the original message.

View File

@@ -0,0 +1,10 @@
From: Mail Delivery Subsystem <mailer-daemon@example.net>
To: sender@example.com
Subject: Delivery delayed
Date: Tue, 02 Jun 2026 10:05:00 +0000
Message-ID: <delayed@example.net>
Content-Type: text/plain; charset=utf-8
Delivery delayed. We will keep trying to deliver your message.
Final-Recipient: rfc822; waiting@example.com
Status: 4.4.1

View File

@@ -0,0 +1,12 @@
From: Mail Delivery Subsystem <mailer-daemon@example.net>
To: sender@example.com
Subject: Final failure
Date: Tue, 02 Jun 2026 10:06:00 +0000
Message-ID: <final-failure@example.net>
Content-Type: text/plain; charset=utf-8
Delivery Status Notification.
Final-Recipient: rfc822; expired@example.com
Action: failed
Status: 5.4.7
Diagnostic-Code: smtp; Could not deliver after retry period. Final failure, giving up.

View File

@@ -0,0 +1,10 @@
From: Mail System <mailer@example.net>
To: sender@example.com
Subject: Return mailbox notice
Date: Tue, 02 Jun 2026 10:07:00 +0000
Message-ID: <unknown-return@example.net>
Content-Type: text/plain; charset=utf-8
This message references delivery and recipient handling, but it does not include
a reliable SMTP status or delivery-status notification.
Recipient reference: mystery@example.com

View File

@@ -0,0 +1,8 @@
From: Recipient <optout@example.com>
To: sender@example.com
Subject: Please unsubscribe me
Date: Tue, 02 Jun 2026 10:08:00 +0000
Message-ID: <unsubscribe@example.com>
Content-Type: text/plain; charset=utf-8
Please unsubscribe optout@example.com from future messages. Remove me from this list.

View File

@@ -38,6 +38,40 @@ class ParserTests(unittest.TestCase):
self.assertEqual(candidate.event_type, "interaction.reply_received") self.assertEqual(candidate.event_type, "interaction.reply_received")
self.assertEqual(candidate.assessment_subclass, "success.reply_received") self.assertEqual(candidate.assessment_subclass, "success.reply_received")
def test_delayed_delivery_notice_stays_deferred(self) -> None:
_inbound, parsed, candidate = parse_message_file(FIXTURES / "delayed_delivery.eml", mailbox_id="test")
self.assertEqual(parsed.message_class, MessageClass.DELAYED_DELIVERY_NOTICE)
self.assertIsNotNone(candidate)
self.assertEqual(candidate.event_type, "notification.endpoint.deferred")
self.assertEqual(candidate.assessment_subclass, "undef.deferred")
def test_final_failure_maps_to_expired_without_delivery(self) -> None:
_inbound, parsed, candidate = parse_message_file(FIXTURES / "final_failure.eml", mailbox_id="test")
self.assertEqual(parsed.message_class, MessageClass.FINAL_DELIVERY_FAILURE)
self.assertIsNotNone(candidate)
self.assertEqual(candidate.event_type, "notification.endpoint.rejected_permanent")
self.assertEqual(candidate.assessment_subclass, "fail.expired_without_delivery")
def test_complaint_maps_to_channel_failure(self) -> None:
_inbound, parsed, candidate = parse_message_file(FIXTURES / "complaint.eml", mailbox_id="test")
self.assertEqual(parsed.message_class, MessageClass.COMPLAINT_OR_ABUSE)
self.assertIsNotNone(candidate)
self.assertEqual(candidate.event_type, "notification.channel.complaint_received")
self.assertEqual(candidate.assessment_subclass, "fail.complaint_received")
def test_unsubscribe_maps_to_opt_out(self) -> None:
_inbound, parsed, candidate = parse_message_file(FIXTURES / "unsubscribe.eml", mailbox_id="test")
self.assertEqual(parsed.message_class, MessageClass.UNSUBSCRIBE_OR_OPT_OUT)
self.assertIsNotNone(candidate)
self.assertEqual(candidate.event_type, "notification.channel.unsubscribe_received")
self.assertEqual(candidate.assessment_subclass, "fail.unsubscribed")
def test_unknown_return_message_is_preserved(self) -> None:
_inbound, parsed, candidate = parse_message_file(FIXTURES / "unknown_return.eml", mailbox_id="test")
self.assertEqual(parsed.message_class, MessageClass.UNKNOWN_RETURN_MESSAGE)
self.assertIsNotNone(candidate)
self.assertEqual(candidate.event_type, "notification.endpoint.unknown")
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()

View File

@@ -2,10 +2,12 @@ from __future__ import annotations
import tempfile import tempfile
import unittest import unittest
from csv import DictReader
from pathlib import Path from pathlib import Path
from email_connect.config import AppConfig, MailboxConfig, ReportsConfig, ScanConfig, SourceConfig, StorageConfig from email_connect.config import AppConfig, MailboxConfig, ReportsConfig, ScanConfig, SourceConfig, StorageConfig
from email_connect.scanner import scan_mailbox from email_connect.scanner import scan_mailbox
from email_connect.storage import StateStore
FIXTURES = Path(__file__).parent / "fixtures" / "mailbox" FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
@@ -24,13 +26,44 @@ class ScannerTests(unittest.TestCase):
) )
first = scan_mailbox(config) first = scan_mailbox(config)
second = scan_mailbox(config) second = scan_mailbox(config)
full = scan_mailbox(config, full_rescan=True, report_only_new=True)
self.assertEqual(first.scan.messages_seen, 4) self.assertEqual(first.scan.messages_seen, 9)
self.assertEqual(first.scan.messages_new, 4) self.assertEqual(first.scan.messages_new, 9)
self.assertGreaterEqual(first.scan.evidence_events_created, 4) self.assertGreaterEqual(first.scan.evidence_events_created, 9)
self.assertEqual(second.scan.messages_seen, 0)
self.assertEqual(second.scan.messages_new, 0) self.assertEqual(second.scan.messages_new, 0)
self.assertEqual(second.scan.evidence_events_created, 0) self.assertEqual(second.scan.evidence_events_created, 0)
self.assertEqual(full.scan.messages_seen, 9)
self.assertEqual(full.scan.messages_new, 0)
self.assertEqual(full.scan.evidence_events_created, 0)
self.assertTrue(first.report_path and first.report_path.exists()) self.assertTrue(first.report_path and first.report_path.exists())
self.assertTrue(full.report_path and full.report_path.exists())
with full.report_path.open(newline="", encoding="utf-8") as fh:
self.assertEqual(list(DictReader(fh)), [])
def test_scan_updates_endpoint_quality(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
scan_mailbox(config)
store = StateStore(config.storage.path)
try:
rows = {row["affected_email_address"]: row for row in store.endpoint_quality_rows()}
finally:
store.close()
self.assertEqual(rows["missing@example.com"]["reachability"], "unreachable")
self.assertEqual(rows["full@example.com"]["reachability"], "degraded")
self.assertEqual(rows["complained@example.com"]["suppression_state"], "suppressed")
self.assertEqual(rows["optout@example.com"]["suppression_state"], "opted_out")
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -716,7 +716,7 @@ Config file is loaded and validated.
```task ```task
id: EMAIL-WP-0002-T02 id: EMAIL-WP-0002-T02
status: progress status: done
priority: high priority: high
state_hub_task_id: "25a4da12-1bcd-4c6d-a0eb-a2f525b9c4b9" state_hub_task_id: "25a4da12-1bcd-4c6d-a0eb-a2f525b9c4b9"
``` ```
@@ -744,7 +744,7 @@ CLI can connect to mailbox and list/fetch messages without modifying mailbox.
```task ```task
id: EMAIL-WP-0002-T03 id: EMAIL-WP-0002-T03
status: progress status: done
priority: high priority: high
state_hub_task_id: "16b95a6b-1375-4c91-8b78-0b75d51e0aeb" state_hub_task_id: "16b95a6b-1375-4c91-8b78-0b75d51e0aeb"
``` ```
@@ -856,7 +856,7 @@ Representative OOO and human reply samples are classified with confidence.
```task ```task
id: EMAIL-WP-0002-T07 id: EMAIL-WP-0002-T07
status: todo status: done
priority: high priority: high
state_hub_task_id: "8637d383-25f7-45b5-9680-427ed2ca87bf" state_hub_task_id: "8637d383-25f7-45b5-9680-427ed2ca87bf"
``` ```
@@ -880,7 +880,7 @@ Representative complaint and unsubscribe examples are classified.
```task ```task
id: EMAIL-WP-0002-T08 id: EMAIL-WP-0002-T08
status: progress status: done
priority: high priority: high
state_hub_task_id: "6d62dea0-f416-4c0b-80a0-7c16422b8e5f" state_hub_task_id: "6d62dea0-f416-4c0b-80a0-7c16422b8e5f"
``` ```
@@ -906,7 +906,7 @@ Parsed messages produce evidence candidates according to the mapping table.
```task ```task
id: EMAIL-WP-0002-T09 id: EMAIL-WP-0002-T09
status: todo status: done
priority: medium priority: medium
state_hub_task_id: "0d110877-953f-4aa2-961b-eec81e0159d4" state_hub_task_id: "0d110877-953f-4aa2-961b-eec81e0159d4"
``` ```
@@ -989,7 +989,7 @@ Automated tests verify expected classification and normalized event output.
```task ```task
id: EMAIL-WP-0002-T12 id: EMAIL-WP-0002-T12
status: progress status: done
priority: medium priority: medium
state_hub_task_id: "a5f7067e-87be-4438-ba35-b12d06a8181e" state_hub_task_id: "a5f7067e-87be-4438-ba35-b12d06a8181e"
``` ```