feat: start mailbox evidence scanner

This commit is contained in:
2026-06-02 01:19:09 +02:00
parent 8292ffe41d
commit 8532583182
26 changed files with 1733 additions and 18 deletions

View File

@@ -0,0 +1,46 @@
# coordination-engine Email Spec Review
## Source Specs
Reviewed local coordination-engine specifications:
- `/home/worsch/coordination-engine/spec/EmailAdapterSpecification.md`
- `/home/worsch/coordination-engine/spec/AdapterInterfaceSpecification.md`
- `/home/worsch/coordination-engine/spec/RuntimeArchitectureAndAdapterSubsystem.md`
- `/home/worsch/coordination-engine/spec/ProductRequirementsDocument.md`
These are referenced, not copied, so `email-connect` stays aligned with the
authoritative coordination-engine checkout.
## Contract Points For email-connect
- `email-connect` is a `notification`, `communication`, and `interaction`
adapter.
- It reports email-channel evidence and advisory assessment. It does not own
coordination case result evaluation.
- Ambiguous provider or mailbox signals must map to the weakest safe normalized
event.
- Provider and mailbox events must preserve raw references and native status
mappings.
- The adapter must expose or enable production of normalized `EvidenceEvent`
records.
- The adapter should expose endpoint quality diagnostics, but endpoint quality
is not coordination success.
- Email's positive evidence ceiling is limited: email tracking cannot prove
human awareness, identity, authority, payload access, or non-repudiation.
- Golden tests must include overclaim prevention, especially "provider
delivered" not becoming awareness or payload delivery.
## First Implementation Implications
- The mailbox scanner emits `EmailEvidenceCandidate` rows that are shaped to
become coordination-engine `EvidenceEvent` records later.
- Classifiers preserve `unknown_return_message` and `parse_failed` instead of
silently discarding uncertainty.
- Out-of-office replies update diagnostics only; they do not prove awareness or
reachability.
- Human replies are email-channel success signals, but not legal acceptance or
coordination result satisfaction.
- CSV reports include event type, assessment category/subclass, confidence,
evidence strength, observed time, occurred time, raw reference, and
deduplication key for auditability.

View File

@@ -0,0 +1,48 @@
# Email Evidence Canon
## Rule
Email events are evidence, not result satisfaction.
The scanner reports email-channel facts and uncertainty. A downstream
coordination runtime decides whether those facts satisfy a coordination case.
## Initial Message Classes
- `hard_bounce`
- `soft_bounce`
- `delayed_delivery_notice`
- `final_delivery_failure`
- `out_of_office`
- `human_reply`
- `complaint_or_abuse`
- `unsubscribe_or_opt_out`
- `challenge_response`
- `unknown_return_message`
- `unrelated_message`
- `parse_failed`
## Initial Normalized Events
| Message class | Normalized event | Assessment |
| --- | --- | --- |
| `hard_bounce` | `notification.endpoint.rejected_permanent` | `fail.hard_bounce` |
| `soft_bounce` | `notification.endpoint.rejected_temporary` | `undef.deferred` |
| `delayed_delivery_notice` | `notification.endpoint.deferred` | `undef.deferred` |
| `final_delivery_failure` | `notification.endpoint.rejected_permanent` | `fail.expired_without_delivery` |
| `out_of_office` | `interaction.out_of_office_received` | `undef.out_of_office` |
| `human_reply` | `interaction.reply_received` | `success.reply_received` |
| `complaint_or_abuse` | `notification.channel.complaint_received` | `fail.complaint_received` |
| `unsubscribe_or_opt_out` | `notification.channel.unsubscribe_received` | `fail.unsubscribed` |
| `unknown_return_message` | `notification.endpoint.unknown` | `undef.conflicting_evidence` |
| `challenge_response` | `interaction.unverified_actor_interaction` | `undef.identity_uncertain` |
## Overclaim Prevention
- No bounce found does not mean delivery success.
- Provider acceptance does not mean endpoint acceptance.
- Endpoint acceptance does not mean inbox placement.
- Out-of-office does not prove recipient awareness or action.
- Human reply does not prove legal acceptance.
- Unknown return messages remain visible.
- Scanner and proxy interactions must stay below identity-bound interaction.

View File

@@ -0,0 +1,98 @@
# Initial Runtime Architecture
## Status
This is the first implementation architecture for the mailbox evidence scanner
slice. It is intentionally small and stdlib-only so the repo can run before a
larger service stack is chosen.
## Service Boundary
The first slice is a CLI scanner:
```text
email-connect scan-mailbox --config config/mailbox.example.yml --out reports/
```
It scans an inbound return mailbox source, classifies messages, stores scan
state in SQLite, and writes timestamped CSV evidence reports.
The initial source implementation supports fixture directories. The IMAP
connector remains the next mailbox boundary to complete under
`EMAIL-WP-0002-T02`.
## Package Layout
```text
src/email_connect/
adapter_contract.py # coordination-engine descriptor and evidence ceiling
cli.py # command line entry points
config.py # config loading
evidence.py # native class to normalized evidence mapping
models.py # mailbox, parse, evidence, endpoint quality dataclasses
parser.py # MIME/header parsing and conservative classification
reporting.py # CSV report generation
scanner.py # scan orchestration
storage.py # SQLite state store
```
## Persistence
SQLite is the MVP store. The initial schema includes:
- `mailbox_scans`
- `mailbox_messages`
- `parsed_messages`
- `evidence_candidates`
Message deduplication is keyed by mailbox ID, IMAP UID when present, message ID,
received timestamp, sender, subject hash, and body hash. Evidence
deduplication follows the workplan fields: message, parser version, normalized
event, affected recipient, original message, SMTP/enhanced status, and reason.
## Evidence Mapping
Parser output is represented as `ParsedMailboxMessage`. The mapper converts it
to `EmailEvidenceCandidate` using coordination-engine event names and advisory
assessment classes.
Examples:
- `hard_bounce` -> `notification.endpoint.rejected_permanent`
- `soft_bounce` -> `notification.endpoint.rejected_temporary`
- `out_of_office` -> `interaction.out_of_office_received`
- `human_reply` -> `interaction.reply_received`
The mapper does not emit evidence for unrelated messages. Unknown return
messages stay visible as `notification.endpoint.unknown`.
## coordination-engine Alignment
The implementation keeps these coordination-engine concepts explicit:
- adapter descriptor
- adapter capability profile
- evidence ceiling
- advisory assessment
- endpoint quality update shape
- event observation and raw reference preservation
- golden tests for overclaim prevention
Email evidence remains below the coordination result layer. The scanner does
not infer inbox placement, human awareness, legal acceptance, payload access, or
case success.
## Provider Boundary
Provider webhook ingestion and outbound send APIs are deliberately outside this
slice. The mailbox scanner uses the same evidence model so future provider
events can enter through a parallel ingestion path and converge at the same
normalization layer.
## Development Commands
```bash
PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
```