feat: add expected recipient reporting

This commit is contained in:
2026-06-02 03:07:13 +02:00
parent 5ea6c738d2
commit b7591f531b
17 changed files with 629 additions and 22 deletions

View File

@@ -13,12 +13,18 @@ overclaiming delivery, awareness, or coordination success.
PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --expected-recipients tests/fixtures/expected_recipients.txt --out reports/
```
The example config uses `tests/fixtures/mailbox` as a mailbox source. Runtime
state is written to `.email-connect/state.sqlite`; generated CSV reports are
written to `reports/`.
Expected recipients are optional. When provided as a text file or CSV, reports
include a `known_recipient` column, place known recipients first, and add
`undef.no_signal` diagnostics for expected recipients with no mailbox evidence.
See [Mailbox Report Tutorial](docs/mailbox-report-tutorial.md).
For a live mailbox, set `mailbox.protocol: imap`, configure host, port, folder,
and credential environment variable names, then export the credentials before
running `scan-mailbox`. IMAP scans select the configured folder read-only and
@@ -35,6 +41,8 @@ marking messages seen are intentionally unsupported in this MVP.
- SQLite state store with scan cursor, message/evidence deduplication, and
endpoint quality hints.
- CSV report generation, including `--report-only-new`.
- Optional expected-recipient text/CSV input, `known_recipient` report
filtering, no-evidence diagnostics, and datetime range filtering.
- Golden fixture tests for hard bounce, soft bounce, delayed delivery, final
failure, complaint, unsubscribe, challenge-response, unknown return,
parse-failure, out-of-office, and human reply signals.

View File

@@ -15,12 +15,18 @@ scan:
mode: incremental
max_messages_per_run: 5000
since: null
from: null
to: null
include_seen: true
mark_seen: false
store_raw_headers: true
store_raw_body: false
store_raw_message_ref: true
expected_recipients:
path: null
csv_column: email
storage:
path: .email-connect/state.sqlite

View File

@@ -37,6 +37,7 @@ coordination runtime decides whether those facts satisfy a coordination case.
| `unknown_return_message` | `notification.endpoint.unknown` | `undef.conflicting_evidence` |
| `challenge_response` | `interaction.unverified_actor_interaction` | `undef.identity_uncertain` |
| `parse_failed` | `diagnostic.message.parse_failed` | `undef.parse_failed` |
| expected recipient with no evidence | `diagnostic.expected_recipient.no_evidence` | `undef.no_signal` |
## Overclaim Prevention
@@ -47,6 +48,8 @@ coordination runtime decides whether those facts satisfy a coordination case.
- Human reply does not prove legal acceptance.
- Unknown return messages remain visible.
- Parse failures are diagnostic rows, not delivery or interaction outcomes.
- Expected-recipient no-evidence rows mean no mailbox evidence was found in the
inspected range, not that notification succeeded or failed.
- Scanner and proxy interactions must stay below identity-bound interaction.
## Endpoint Quality Hints

View File

@@ -0,0 +1,139 @@
# Mailbox Report Tutorial
This tutorial shows how to generate an email-channel evidence report from a
return mailbox or from the bundled fixture mailbox.
## 1. Verify The Scanner
Run the tests and print the adapter descriptor:
```bash
PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
```
## 2. Start With Fixtures
The example config uses `tests/fixtures/mailbox`:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--full-rescan \
--out reports/
```
The scanner writes a timestamped CSV file to `reports/`.
## 3. Configure A Live IMAP Mailbox
Copy `config/mailbox.example.yml` and set:
```yaml
mailbox:
protocol: imap
host: imap.example.com
port: 993
tls: true
username_env: EMAIL_CONNECT_IMAP_USER
password_env: EMAIL_CONNECT_IMAP_PASSWORD
folder: INBOX
```
Then export credentials:
```bash
export EMAIL_CONNECT_IMAP_USER='mailbox@example.com'
export EMAIL_CONNECT_IMAP_PASSWORD='app-password'
```
IMAP scans select the folder read-only and fetch messages with `BODY.PEEK[]`.
The scanner does not mark messages seen, move messages, or delete messages.
## 4. Add Expected Recipients
Expected recipients are optional. A newline-separated file can look like:
```text
missing@example.com
recipient@example.com
```
Run:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--expected-recipients recipients.txt \
--out reports/
```
CSV input is also supported:
```csv
email,name
missing@example.com,Missing User
recipient@example.com,Known Recipient
```
Run:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--expected-recipients recipients.csv \
--expected-recipient-column email \
--out reports/
```
Invalid recipient rows are ignored and printed as warnings.
## 5. Limit The Time Range
Use an inclusive datetime range:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--from 2026-06-02T10:00:00Z \
--to 2026-06-02T11:00:00Z \
--out reports/
```
`--since` remains a compatibility alias for the lower bound. When a range is
active, messages with no parseable `Date` header are excluded because the
scanner cannot confirm that they originated inside the requested window.
## 6. Read The Report
Key columns:
- `known_recipient`: `true` when the address was supplied in the expected list.
- `normalized_event_type`: the email evidence or diagnostic event.
- `assessment_category` and `assessment_subclass`: advisory interpretation.
- `affected_email_address`: the endpoint the row is about.
Known recipients appear first by default so spreadsheet filtering is easy.
Expected recipients with no mailbox evidence appear as:
```text
normalized_event_type: diagnostic.expected_recipient.no_evidence
assessment_category: undef
assessment_subclass: undef.no_signal
evidence_strength: none
known_recipient: true
```
That row means only that no mailbox evidence was found for the supplied address
inside the inspected range. It is not evidence of delivery success, delivery
failure, recipient awareness, or legal acceptance.
## 7. Troubleshooting
- Empty report: check folder, time range, and whether incremental cursor state
already skipped older messages. Try `--full-rescan`.
- IMAP credential error: verify the environment variable names and values.
- Missing expected rows: check the recipient file path and CSV column name.
- Unexpected no-evidence rows: confirm that the relevant mailbox evidence is
inside the configured datetime range.

View File

@@ -13,6 +13,7 @@ EMITTED_EVENT_TYPES = [
"interaction.out_of_office_received",
"notification.endpoint.unknown",
"diagnostic.message.parse_failed",
"diagnostic.expected_recipient.no_evidence",
]

View File

@@ -18,6 +18,10 @@ def main(argv: list[str] | None = None) -> int:
scan.add_argument("--out", default=None)
scan.add_argument("--full-rescan", action="store_true")
scan.add_argument("--since", default=None)
scan.add_argument("--from", dest="range_from", default=None)
scan.add_argument("--to", dest="range_to", default=None)
scan.add_argument("--expected-recipients", default=None)
scan.add_argument("--expected-recipient-column", default=None)
scan.add_argument("--report-only-new", action="store_true")
scan.add_argument("--dry-run", action="store_true")
scan.add_argument("--fixture-dir", default=None)
@@ -39,6 +43,10 @@ def main(argv: list[str] | None = None) -> int:
dry_run=args.dry_run,
fixture_dir=args.fixture_dir,
since=args.since,
range_from=args.range_from,
range_to=args.range_to,
expected_recipients_path=args.expected_recipients,
expected_recipient_column=args.expected_recipient_column,
)
print(f"scan_id={result.scan.scan_id}")
print(f"messages_seen={result.scan.messages_seen}")
@@ -47,6 +55,8 @@ def main(argv: list[str] | None = None) -> int:
print(f"evidence_events_created={result.scan.evidence_events_created}")
if result.report_path:
print(f"report_path={Path(result.report_path)}")
for warning in result.warnings:
print(f"warning={warning}")
return 0
return 2

View File

@@ -22,6 +22,8 @@ class ScanConfig:
mode: str = "incremental"
max_messages_per_run: int = 5000
since: str | None = None
range_from: str | None = None
range_to: str | None = None
include_seen: bool = True
mark_seen: bool = False
store_raw_headers: bool = True
@@ -47,6 +49,12 @@ class SourceConfig:
fixture_dir: str | None = None
@dataclass(frozen=True)
class ExpectedRecipientsConfig:
path: str | None = None
csv_column: str = "email"
@dataclass(frozen=True)
class AppConfig:
mailbox: MailboxConfig
@@ -54,6 +62,7 @@ class AppConfig:
storage: StorageConfig
reports: ReportsConfig
source: SourceConfig = SourceConfig()
expected_recipients: ExpectedRecipientsConfig = ExpectedRecipientsConfig()
def load_config(path: str | Path) -> AppConfig:
@@ -63,6 +72,7 @@ def load_config(path: str | Path) -> AppConfig:
storage = data.get("storage", {})
reports = data.get("reports", {})
source = data.get("source", {})
expected_recipients = data.get("expected_recipients", {})
return AppConfig(
mailbox=MailboxConfig(
id=str(mailbox.get("id", "return-mailbox-default")),
@@ -78,6 +88,8 @@ def load_config(path: str | Path) -> AppConfig:
mode=str(scan.get("mode", "incremental")),
max_messages_per_run=int(scan.get("max_messages_per_run", 5000)),
since=scan.get("since"),
range_from=scan.get("from") or scan.get("range_from"),
range_to=scan.get("to") or scan.get("range_to"),
include_seen=bool(scan.get("include_seen", True)),
mark_seen=bool(scan.get("mark_seen", False)),
store_raw_headers=bool(scan.get("store_raw_headers", True)),
@@ -92,6 +104,10 @@ def load_config(path: str | Path) -> AppConfig:
timestamp_timezone=str(reports.get("timestamp_timezone", "UTC")),
),
source=SourceConfig(fixture_dir=source.get("fixture_dir")),
expected_recipients=ExpectedRecipientsConfig(
path=expected_recipients.get("path"),
csv_column=str(expected_recipients.get("csv_column", "email")),
),
)

View File

@@ -57,6 +57,8 @@ class MailboxScan:
evidence_events_created: int = 0
report_path: str | None = None
since: datetime | None = None
range_start: datetime | None = None
range_end: datetime | None = None
@dataclass(frozen=True)

View File

@@ -0,0 +1,79 @@
from __future__ import annotations
import csv
import re
from dataclasses import dataclass, field
from pathlib import Path
EMAIL_RE = re.compile(r"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$", re.IGNORECASE)
@dataclass(frozen=True)
class ExpectedRecipients:
addresses: tuple[str, ...] = ()
invalid_entries: tuple[str, ...] = ()
def load_expected_recipients(
path: str | Path | None,
*,
csv_column: str | None = "email",
) -> ExpectedRecipients:
if not path:
return ExpectedRecipients()
recipient_path = Path(path)
if recipient_path.suffix.lower() == ".csv":
return _load_csv_recipients(recipient_path, csv_column=csv_column or "email")
return _load_line_recipients(recipient_path)
def normalize_email_address(value: str | None) -> str | None:
if value is None:
return None
normalized = value.strip().lower()
if not normalized:
return None
return normalized if EMAIL_RE.fullmatch(normalized) else None
@dataclass
class _RecipientCollector:
addresses: dict[str, None] = field(default_factory=dict)
invalid_entries: list[str] = field(default_factory=list)
def add(self, value: str | None, *, source: str) -> None:
normalized = normalize_email_address(value)
if normalized:
self.addresses[normalized] = None
return
if value and value.strip():
self.invalid_entries.append(f"{source}: {value.strip()}")
def result(self) -> ExpectedRecipients:
return ExpectedRecipients(
addresses=tuple(self.addresses.keys()),
invalid_entries=tuple(self.invalid_entries),
)
def _load_line_recipients(path: Path) -> ExpectedRecipients:
collector = _RecipientCollector()
for line_number, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
line = raw_line.strip()
if not line or line.startswith("#"):
continue
collector.add(line, source=f"{path}:{line_number}")
return collector.result()
def _load_csv_recipients(path: Path, *, csv_column: str) -> ExpectedRecipients:
collector = _RecipientCollector()
with path.open(newline="", encoding="utf-8") as fh:
reader = csv.DictReader(fh)
if reader.fieldnames is None:
return collector.result()
column = csv_column if csv_column in reader.fieldnames else reader.fieldnames[0]
for line_number, row in enumerate(reader, start=2):
collector.add(row.get(column), source=f"{path}:{line_number}:{column}")
return collector.result()

View File

@@ -20,6 +20,7 @@ REPORT_COLUMNS = [
"assessment_category",
"assessment_subclass",
"affected_email_address",
"known_recipient",
"original_message_id",
"original_recipient",
"smtp_status_code",
@@ -49,16 +50,17 @@ def write_evidence_report(
scan_id: str,
mailbox_id: str,
generated_at: datetime | None = None,
expected_recipients: set[str] | None = None,
) -> Path:
generated = generated_at or datetime.now(UTC)
out_dir = Path(output_dir)
out_dir.mkdir(parents=True, exist_ok=True)
path = out_dir / report_filename(generated)
path = _unique_report_path(out_dir / report_filename(generated))
with path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=REPORT_COLUMNS)
writer.writeheader()
for row in rows:
for row in _ordered_rows(rows, expected_recipients=expected_recipients or set()):
writer.writerow(_report_row(row, scan_id=scan_id, mailbox_id=mailbox_id, generated_at=generated))
return path
@@ -66,6 +68,7 @@ def write_evidence_report(
def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datetime) -> dict:
metadata = _json(row.get("metadata_json"))
notes = _json(row.get("notes_json"))
known_recipient = _known_recipient(row, expected_recipients=set(row.get("_expected_recipients", [])))
return {
"report_generated_at": generated_at.isoformat(),
"scan_id": scan_id,
@@ -81,6 +84,7 @@ def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datet
"assessment_category": row.get("assessment_category", ""),
"assessment_subclass": row.get("assessment_subclass", ""),
"affected_email_address": row.get("affected_email_address") or "",
"known_recipient": "true" if known_recipient else "false",
"original_message_id": row.get("original_message_id") or "",
"original_recipient": metadata.get("original_recipient", ""),
"smtp_status_code": metadata.get("smtp_status_code") or "",
@@ -98,6 +102,29 @@ def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datet
}
def _ordered_rows(rows: list[dict], *, expected_recipients: set[str]) -> list[dict]:
enriched = [dict(row, _expected_recipients=tuple(expected_recipients)) for row in rows]
if not expected_recipients:
return enriched
return sorted(
enriched,
key=lambda row: (
not _known_recipient(row, expected_recipients=expected_recipients),
str(row.get("affected_email_address") or ""),
str(row.get("observed_at") or ""),
str(row.get("event_type") or ""),
str(row.get("deduplication_key") or ""),
),
)
def _known_recipient(row: dict, *, expected_recipients: set[str]) -> bool:
if row.get("known_recipient") is True:
return True
address = str(row.get("affected_email_address") or "").lower()
return bool(address and address in expected_recipients)
def _json(value: str | None) -> dict | list:
if not value:
return {}
@@ -105,3 +132,15 @@ def _json(value: str | None) -> dict | list:
return json.loads(value)
except json.JSONDecodeError:
return {}
def _unique_report_path(path: Path) -> Path:
if not path.exists():
return path
stem = path.stem
suffix = path.suffix
for index in range(1, 1000):
candidate = path.with_name(f"{stem}-{index:02d}{suffix}")
if not candidate.exists():
return candidate
raise RuntimeError(f"Could not allocate unique report filename for {path}")

View File

@@ -1,7 +1,9 @@
from __future__ import annotations
import json
from dataclasses import dataclass
from datetime import UTC, datetime
from pathlib import Path
from uuid import uuid4
from .config import AppConfig
@@ -9,6 +11,7 @@ from .evidence import endpoint_quality_from_candidate
from .mailbox import source_for_config
from .models import MailboxScan
from .parser import parse_message_bytes
from .recipients import load_expected_recipients
from .reporting import write_evidence_report
from .storage import StateStore
@@ -17,6 +20,7 @@ from .storage import StateStore
class ScanResult:
scan: MailboxScan
report_path: Path | None
warnings: tuple[str, ...] = ()
def scan_mailbox(
@@ -28,10 +32,24 @@ def scan_mailbox(
dry_run: bool = False,
fixture_dir: str | None = None,
since: str | None = None,
range_from: str | None = None,
range_to: str | None = None,
expected_recipients_path: str | None = None,
expected_recipient_column: str | None = None,
) -> ScanResult:
started_at = datetime.now(UTC)
scan_id = str(uuid4())
since_at = _parse_since(since or config.scan.since)
range_start = _parse_datetime(range_from or since or config.scan.range_from or config.scan.since)
range_end = _parse_datetime(range_to or config.scan.range_to)
if range_start and range_end and range_start > range_end:
raise ValueError("scan datetime range lower bound must be before or equal to upper bound.")
since_at = range_start
expected = load_expected_recipients(
expected_recipients_path or config.expected_recipients.path,
csv_column=expected_recipient_column or config.expected_recipients.csv_column,
)
expected_addresses = set(expected.addresses)
warnings = tuple(f"invalid expected recipient ignored: {entry}" for entry in expected.invalid_entries)
source = source_for_config(config, fixture_dir_override=fixture_dir)
store = StateStore(config.storage.path)
@@ -58,7 +76,7 @@ def scan_mailbox(
raw_message_ref=message.raw_message_ref,
imap_uid=message.imap_uid,
)
if since_at and inbound.received_at and inbound.received_at < since_at:
if not _in_range(inbound.received_at, range_start=range_start, range_end=range_end):
continue
if dry_run:
messages_parsed += 1
@@ -91,14 +109,29 @@ def scan_mailbox(
report_path = None
if not dry_run:
range_evidence_rows = store.evidence_rows(range_start=range_start, range_end=range_end)
report_rows = store.evidence_rows(
deduplication_keys=new_evidence_keys if report_only_new else None,
range_start=range_start,
range_end=range_end,
)
report_rows = [
*report_rows,
*_no_evidence_rows(
mailbox_id=config.mailbox.id,
expected_addresses=expected_addresses,
evidence_rows=range_evidence_rows,
observed_at=datetime.now(UTC),
range_start=range_start,
range_end=range_end,
),
]
report_path = write_evidence_report(
report_rows,
output_dir=output_dir or config.reports.output_dir,
scan_id=scan_id,
mailbox_id=config.mailbox.id,
expected_recipients=expected_addresses,
)
finished_at = datetime.now(UTC)
scan = MailboxScan(
@@ -115,10 +148,12 @@ def scan_mailbox(
evidence_events_created=evidence_created,
report_path=str(report_path) if report_path else None,
since=since_at,
range_start=range_start,
range_end=range_end,
)
if not dry_run:
store.insert_scan(scan)
return ScanResult(scan=scan, report_path=report_path)
return ScanResult(scan=scan, report_path=report_path, warnings=warnings)
finally:
store.close()
@@ -142,7 +177,7 @@ def _enrich_candidate(candidate, inbound, parsed):
)
def _parse_since(value: str | None) -> datetime | None:
def _parse_datetime(value: str | None) -> datetime | None:
if not value:
return None
normalized = value.strip()
@@ -154,3 +189,81 @@ def _parse_since(value: str | None) -> datetime | None:
if parsed.tzinfo is None:
return parsed.replace(tzinfo=UTC)
return parsed.astimezone(UTC)
def _in_range(
received_at: datetime | None,
*,
range_start: datetime | None,
range_end: datetime | None,
) -> bool:
if range_start is None and range_end is None:
return True
if received_at is None:
return False
if range_start is not None and received_at < range_start:
return False
if range_end is not None and received_at > range_end:
return False
return True
def _no_evidence_rows(
*,
mailbox_id: str,
expected_addresses: set[str],
evidence_rows: list[dict],
observed_at: datetime,
range_start: datetime | None,
range_end: datetime | None,
) -> list[dict]:
if not expected_addresses:
return []
known_evidence_addresses = {
str(row.get("affected_email_address") or "").lower()
for row in evidence_rows
if row.get("affected_email_address")
}
rows = []
for address in sorted(expected_addresses - known_evidence_addresses):
rows.append(_no_evidence_row(mailbox_id, address, observed_at, range_start=range_start, range_end=range_end))
return rows
def _no_evidence_row(
mailbox_id: str,
address: str,
observed_at: datetime,
*,
range_start: datetime | None,
range_end: datetime | None,
) -> dict:
range_key = "|".join([
range_start.isoformat() if range_start else "",
range_end.isoformat() if range_end else "",
])
return {
"mailbox_message_id": "",
"event_type": "diagnostic.expected_recipient.no_evidence",
"assessment_category": "undef",
"assessment_subclass": "undef.no_signal",
"affected_email_address": address,
"original_message_id": "",
"confidence": "high",
"evidence_strength": "none",
"occurred_at": "",
"observed_at": observed_at.isoformat(),
"deduplication_key": f"{mailbox_id}|expected_recipient|no_evidence|{address}|{range_key}",
"raw_message_ref": "",
"notes_json": json.dumps([
"Expected recipient was supplied by the operator; no mailbox evidence was found in the inspected range.",
"This is not evidence of delivery success or delivery failure.",
]),
"metadata_json": json.dumps({
"message_class": "expected_recipient_no_evidence",
"original_recipient": address,
"range_start": range_start.isoformat() if range_start else None,
"range_end": range_end.isoformat() if range_end else None,
}),
"known_recipient": True,
}

View File

@@ -42,7 +42,9 @@ class StateStore:
messages_parsed integer not null,
evidence_events_created integer not null,
report_path text,
since text
since text,
range_start text,
range_end text
);
create table if not exists mailbox_messages (
@@ -121,8 +123,18 @@ class StateStore:
);
"""
)
self._ensure_column("mailbox_scans", "range_start", "text")
self._ensure_column("mailbox_scans", "range_end", "text")
self.conn.commit()
def _ensure_column(self, table: str, column: str, column_type: str) -> None:
columns = {
str(row["name"])
for row in self.conn.execute(f"pragma table_info({table})").fetchall()
}
if column not in columns:
self.conn.execute(f"alter table {table} add column {column} {column_type}")
def upsert_message(self, message: InboundMailboxMessage) -> bool:
existing = self.conn.execute(
"select mailbox_message_id from mailbox_messages where deduplication_key = ?",
@@ -214,7 +226,7 @@ class StateStore:
def insert_scan(self, scan: MailboxScan) -> None:
self.conn.execute(
"""
insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
scan.scan_id,
@@ -230,6 +242,8 @@ class StateStore:
scan.evidence_events_created,
scan.report_path,
_dt(scan.since),
_dt(scan.range_start),
_dt(scan.range_end),
),
)
self.conn.commit()
@@ -304,7 +318,13 @@ class StateStore:
).fetchall()
return [dict(row) for row in rows]
def evidence_rows(self, *, deduplication_keys: list[str] | None = None) -> list[dict]:
def evidence_rows(
self,
*,
deduplication_keys: list[str] | None = None,
range_start: datetime | None = None,
range_end: datetime | None = None,
) -> list[dict]:
if deduplication_keys is not None:
if not deduplication_keys:
return []
@@ -317,9 +337,9 @@ class StateStore:
""",
deduplication_keys,
).fetchall()
return [dict(row) for row in rows]
return _filter_rows_by_range([dict(row) for row in rows], range_start=range_start, range_end=range_end)
rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall()
return [dict(row) for row in rows]
return _filter_rows_by_range([dict(row) for row in rows], range_start=range_start, range_end=range_end)
def _dt(value: datetime | None) -> str | None:
@@ -330,6 +350,28 @@ def _parse_dt(value: str | None) -> datetime | None:
return datetime.fromisoformat(value) if value else None
def _filter_rows_by_range(
rows: list[dict],
*,
range_start: datetime | None,
range_end: datetime | None,
) -> list[dict]:
if range_start is None and range_end is None:
return rows
return [row for row in rows if _row_in_range(row, range_start=range_start, range_end=range_end)]
def _row_in_range(row: dict, *, range_start: datetime | None, range_end: datetime | None) -> bool:
occurred_at = _parse_dt(row.get("occurred_at"))
if occurred_at is None:
return False
if range_start is not None and occurred_at < range_start:
return False
if range_end is not None and occurred_at > range_end:
return False
return True
def _merge_endpoint_quality(
existing,
update: EndpointQualityUpdate,

View File

@@ -0,0 +1,5 @@
email,name
optout@example.com,Opt Out
csv-absent@example.com,Missing From Mailbox
OPTOut@example.com,Duplicate Case Variant
not-an-address,Invalid
1 email name
2 optout@example.com Opt Out
3 csv-absent@example.com Missing From Mailbox
4 OPTOut@example.com Duplicate Case Variant
5 not-an-address Invalid

View File

@@ -0,0 +1,4 @@
missing@example.com
absent@example.com
MISSING@example.com
not-an-address

31
tests/test_recipients.py Normal file
View File

@@ -0,0 +1,31 @@
from __future__ import annotations
import unittest
from pathlib import Path
from email_connect.recipients import load_expected_recipients, normalize_email_address
FIXTURES = Path(__file__).parent / "fixtures"
class RecipientTests(unittest.TestCase):
def test_normalizes_email_addresses(self) -> None:
self.assertEqual(normalize_email_address(" USER@Example.COM "), "user@example.com")
self.assertIsNone(normalize_email_address("not-an-address"))
def test_loads_line_separated_recipients(self) -> None:
recipients = load_expected_recipients(FIXTURES / "expected_recipients.txt")
self.assertEqual(recipients.addresses, ("missing@example.com", "absent@example.com"))
self.assertEqual(len(recipients.invalid_entries), 1)
def test_loads_csv_recipients(self) -> None:
recipients = load_expected_recipients(FIXTURES / "expected_recipients.csv", csv_column="email")
self.assertEqual(recipients.addresses, ("optout@example.com", "csv-absent@example.com"))
self.assertEqual(len(recipients.invalid_entries), 1)
if __name__ == "__main__":
unittest.main()

View File

@@ -11,6 +11,7 @@ from email_connect.storage import StateStore
FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
RECIPIENT_FIXTURES = Path(__file__).parent / "fixtures"
class ScannerTests(unittest.TestCase):
@@ -38,6 +39,10 @@ class ScannerTests(unittest.TestCase):
self.assertEqual(full.scan.messages_new, 0)
self.assertEqual(full.scan.evidence_events_created, 0)
self.assertTrue(first.report_path and first.report_path.exists())
with first.report_path.open(newline="", encoding="utf-8") as fh:
first_rows = list(DictReader(fh))
self.assertTrue(first_rows)
self.assertTrue(all(row["known_recipient"] == "false" for row in first_rows))
self.assertTrue(full.report_path and full.report_path.exists())
with full.report_path.open(newline="", encoding="utf-8") as fh:
self.assertEqual(list(DictReader(fh)), [])
@@ -65,6 +70,110 @@ class ScannerTests(unittest.TestCase):
self.assertEqual(rows["complained@example.com"]["suppression_state"], "suppressed")
self.assertEqual(rows["optout@example.com"]["suppression_state"], "opted_out")
def test_expected_recipients_sort_first_and_get_no_evidence_rows(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(RECIPIENT_FIXTURES / "expected_recipients.txt"),
)
self.assertEqual(len(result.warnings), 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
self.assertGreater(len(rows), 2)
known_flags = [row["known_recipient"] for row in rows]
self.assertEqual(known_flags, sorted(known_flags, reverse=True))
missing_rows = [row for row in rows if row["affected_email_address"] == "missing@example.com"]
self.assertTrue(missing_rows)
self.assertTrue(all(row["known_recipient"] == "true" for row in missing_rows))
absent_rows = [row for row in rows if row["affected_email_address"] == "absent@example.com"]
self.assertEqual(len(absent_rows), 1)
self.assertEqual(absent_rows[0]["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
self.assertEqual(absent_rows[0]["assessment_category"], "undef")
self.assertEqual(absent_rows[0]["assessment_subclass"], "undef.no_signal")
self.assertEqual(absent_rows[0]["evidence_strength"], "none")
store = StateStore(config.storage.path)
try:
quality_addresses = {row["affected_email_address"] for row in store.endpoint_quality_rows()}
finally:
store.close()
self.assertNotIn("absent@example.com", quality_addresses)
def test_csv_expected_recipients_are_supported(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(RECIPIENT_FIXTURES / "expected_recipients.csv"),
expected_recipient_column="email",
)
self.assertEqual(len(result.warnings), 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
optout_rows = [row for row in rows if row["affected_email_address"] == "optout@example.com"]
self.assertTrue(optout_rows)
self.assertTrue(all(row["known_recipient"] == "true" for row in optout_rows))
csv_absent = [row for row in rows if row["affected_email_address"] == "csv-absent@example.com"]
self.assertEqual(csv_absent[0]["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
def test_datetime_range_excludes_messages_outside_range(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
expected_path = root / "expected.txt"
expected_path.write_text("complained@example.com\nmissing@example.com\n", encoding="utf-8")
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(expected_path),
range_from="2026-06-02T10:04:00Z",
range_to="2026-06-02T10:04:00Z",
)
self.assertEqual(result.scan.messages_seen, 11)
self.assertEqual(result.scan.messages_parsed, 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
self.assertEqual({row["affected_email_address"] for row in rows}, {"complained@example.com", "missing@example.com"})
complaint = [row for row in rows if row["affected_email_address"] == "complained@example.com"][0]
missing = [row for row in rows if row["affected_email_address"] == "missing@example.com"][0]
self.assertEqual(complaint["normalized_event_type"], "notification.channel.complaint_received")
self.assertEqual(missing["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
self.assertEqual(result.scan.range_start.isoformat(), "2026-06-02T10:04:00+00:00")
self.assertEqual(result.scan.range_end.isoformat(), "2026-06-02T10:04:00+00:00")
if __name__ == "__main__":
unittest.main()

View File

@@ -4,7 +4,7 @@ type: workplan
title: "Expected Recipient Reporting and Mailbox Tutorial"
domain: custodian
repo: email-connect
status: active
status: finished
owner: codex
topic_slug: custodian
created: "2026-06-02"
@@ -122,9 +122,9 @@ The scanner should support an optional inclusive datetime range:
Messages outside the range must be excluded before parsing and evidence
generation whenever the message timestamp is available. The range should also be
usable from config. If a message has no parseable timestamp, the implementation
should either include it with a diagnostic note or exclude it only when the
behavior is explicitly documented and tested.
usable from config. If a message has no parseable timestamp while a range is
active, it is excluded because the scanner cannot confirm that it originated
inside the requested window.
Existing `--since` behavior may be retained as a compatibility alias for the
lower bound, but the new range should be expressed clearly in documentation.
@@ -146,7 +146,7 @@ email-connect scan-mailbox --config config/mailbox.yml --from 2026-06-01T00:00:0
```task
id: EMAIL-WP-0003-T01
status: todo
status: done
priority: high
state_hub_task_id: "d1cd0de0-cbd5-4e8d-8179-000ba10e5506"
```
@@ -174,7 +174,7 @@ Invalid recipient rows are visible as warnings or diagnostics.
```task
id: EMAIL-WP-0003-T02
status: todo
status: done
priority: high
state_hub_task_id: "3d7d3bb8-4118-4158-b874-b4e0527eaa85"
```
@@ -201,7 +201,7 @@ unknown-recipient rows by default.
```task
id: EMAIL-WP-0003-T03
status: todo
status: done
priority: high
state_hub_task_id: "aa737837-2f19-4fbf-9920-f98413bd9779"
```
@@ -230,7 +230,7 @@ no-signal diagnostics, not as failures or successes.
```task
id: EMAIL-WP-0003-T04
status: todo
status: done
priority: medium
state_hub_task_id: "731cf592-1bbe-4143-b21b-721af281528c"
```
@@ -255,7 +255,7 @@ or partial expected recipients.
```task
id: EMAIL-WP-0003-T05
status: todo
status: done
priority: high
state_hub_task_id: "22585e83-d995-42d9-9ab2-c383b055fbb8"
```
@@ -284,7 +284,7 @@ message timestamp falls within the range according to the documented rules.
```task
id: EMAIL-WP-0003-T06
status: todo
status: done
priority: high
state_hub_task_id: "f30cd5b9-5035-42b4-9eca-a104e2b26ecb"
```
@@ -313,7 +313,7 @@ and datetime range filtering.
```task
id: EMAIL-WP-0003-T07
status: todo
status: done
priority: medium
state_hub_task_id: "00a29cb9-ac5a-4784-a9c4-7f2d4905405c"
```