24 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| EMAIL-WP-0002 | workplan | MVP Mailbox Evidence Scanner | custodian | email-connect | finished | codex | custodian | 2026-06-02 | 2026-06-02 | c81788aa-0d0a-4493-bf41-ab6cc2068f2f |
EMAIL-WP-0002 - MVP Mailbox Evidence Scanner
Review Fixes Applied
This workplan was reviewed and registered against the local State Hub workplan convention.
Fixes applied:
- Added ADR-001 frontmatter so State Hub can index the workstream.
- Converted the MVP work packages into
EMAIL-WP-0002-TNNtask blocks. - Clarified that out-of-office replies are evidence signals, not proof of reachability or awareness.
- Aligned suggested repository paths with the current repo layout.
Implementation should preserve the product rule from INTENT.md: email events
are evidence, not result satisfaction.
1. MVP Name
Mailbox Evidence Scanner MVP
2. Purpose
This MVP establishes the first practical implementation slice of email-connect.
Given access to a mailbox that receives bounce mails, delivery-status notifications, out-of-office replies, human replies, complaints, unsubscribe messages, and other return messages from a previous batch of emails, the system shall scan and rescan the mailbox, classify inbound messages, extract email-channel evidence, and generate timestamped CSV evidence reports.
The MVP proves the core email-connect value without requiring outbound sending, provider webhook integration, template management, or a full UI.
3. Core MVP Hypothesis
email-connect can provide immediate standalone value by turning an inbound return mailbox into structured, timestamped email-channel evidence.
The MVP validates:
mailbox access
message scanning
rescan safety
bounce parsing
reply classification
evidence normalization
deduplication
endpoint-quality hints
CSV reporting
The result should be useful to humans, scripts, and future coordination-engine integrations.
4. In Scope
The MVP shall support:
- Connecting to one IMAP mailbox.
- Scanning messages in a selected folder.
- Incremental scans using a stored cursor.
- Full rescans.
- Raw message metadata extraction.
- Basic MIME parsing.
- Bounce / DSN classification.
- Out-of-office classification.
- Human reply classification.
- Unknown/unparseable message classification.
- Extraction of affected recipient address where possible.
- Extraction of SMTP status code and enhanced status code where possible.
- Evidence event candidate generation.
- Deduplication of already-seen mailbox messages.
- Deduplication of already-emitted evidence.
- Timestamped CSV report generation.
- Basic local storage for scan state and parsed evidence.
- CLI entry point.
- Minimal configuration file.
5. Out of Scope for MVP
The MVP does not need to support:
- Outbound email sending.
- Provider-specific webhooks.
- Multiple email providers.
- Full UI.
- OAuth mailbox login.
- Advanced deliverability analytics.
- Advanced natural-language reply interpretation.
- Full suppression management UI.
- Full endpoint quality dashboard.
- coordination-engine live integration.
- Database server deployment.
- Multi-tenant operation.
- Complex mailbox write-back actions.
- Deleting, moving, or marking mailbox messages.
- Legal delivery assessment.
6. MVP User Story
As an operator or developer, I want to point email-connect at a mailbox containing return emails from a previous outbound batch, scan and rescan it safely, and receive a timestamped CSV report showing what email-channel evidence was found for each affected address or message.
7. Target Workflow
1. Configure mailbox access.
2. Run scanner.
3. Scanner fetches messages from selected folder.
4. Scanner parses headers, body, MIME parts, and DSN attachments.
5. Scanner classifies each message.
6. Scanner extracts evidence fields.
7. Scanner deduplicates message and evidence records.
8. Scanner stores scan state.
9. Scanner writes timestamped CSV report.
10. User reviews report or imports it elsewhere.
8. CLI Target
Initial CLI command:
email-connect scan-mailbox --config config/mailbox.yml --out reports/
Recommended CLI variants:
email-connect scan-mailbox --config config/mailbox.yml --out reports/
email-connect scan-mailbox --config config/mailbox.yml --full-rescan --out reports/
email-connect scan-mailbox --config config/mailbox.yml --since 2026-01-01 --out reports/
email-connect scan-mailbox --config config/mailbox.yml --report-only-new --out reports/
email-connect scan-mailbox --config config/mailbox.yml --dry-run
9. Configuration
Example config/mailbox.yml:
mailbox:
id: return-mailbox-default
protocol: imap
host: imap.example.com
port: 993
tls: true
username_env: EMAIL_CONNECT_IMAP_USER
password_env: EMAIL_CONNECT_IMAP_PASSWORD
folder: INBOX
scan:
mode: incremental
max_messages_per_run: 5000
since: null
include_seen: true
mark_seen: false
store_raw_headers: true
store_raw_body: false
store_raw_message_ref: true
storage:
path: .email-connect/state.sqlite
reports:
output_dir: reports
include_all_evidence: true
include_unknown_messages: true
timestamp_timezone: UTC
10. Minimal Data Model
10.1 MailboxScan
MailboxScan:
scan_id: string
mailbox_id: string
started_at: timestamp
finished_at: timestamp?
scan_mode: incremental | full_rescan
since: timestamp?
folder: string
status: running | completed | failed
messages_seen: integer
messages_new: integer
messages_parsed: integer
evidence_events_created: integer
report_path: string?
10.2 InboundMailboxMessage
InboundMailboxMessage:
mailbox_message_id: string
mailbox_id: string
imap_uid: string?
message_id_header: string?
received_at: timestamp?
from_address: string?
to_addresses:
- string
subject: string?
raw_headers_ref: string?
raw_message_ref: string?
first_seen_at: timestamp
last_seen_at: timestamp
deduplication_key: string
10.3 ParsedMailboxMessage
ParsedMailboxMessage:
parsed_message_id: string
mailbox_message_id: string
parser_version: string
message_class: hard_bounce | soft_bounce | delayed_delivery_notice | final_delivery_failure | out_of_office | human_reply | complaint_or_abuse | unsubscribe_or_opt_out | challenge_response | unknown_return_message | unrelated_message | parse_failed
affected_email_address: string?
original_message_id: string?
original_recipient: string?
smtp_status_code: string?
enhanced_status_code: string?
reason_code: string?
confidence: low | medium | high
parsed_at: timestamp
notes:
- string
10.4 EmailEvidenceCandidate
EmailEvidenceCandidate:
evidence_candidate_id: string
mailbox_message_id: string
parsed_message_id: string
event_type: string
assessment_category: success | fail | undef
assessment_subclass: string
affected_email_address: string?
original_message_id: string?
confidence: low | medium | high
evidence_strength: none | weak | medium | strong | negative | ambiguous
occurred_at: timestamp?
observed_at: timestamp
deduplication_key: string
raw_message_ref: string?
notes:
- string
11. Message Classification Rules
11.1 Hard Bounce
Signals:
Delivery failure notice
Permanent failure
Unknown user
Mailbox not found
Domain not found
5xx SMTP status
Enhanced status code 5.x.x
Normalized event:
notification.endpoint.rejected_permanent
Assessment:
category: fail
subclass: fail.hard_bounce
11.2 Soft Bounce
Signals:
Temporary failure
Mailbox full
Greylisting
Temporary server failure
4xx SMTP status
Enhanced status code 4.x.x
Normalized event:
notification.endpoint.rejected_temporary
Assessment:
category: undef
subclass: undef.deferred
11.3 Delayed Delivery Notice
Signals:
Delivery delayed
Will keep trying
Message not yet delivered
Normalized event:
notification.endpoint.deferred
Assessment:
category: undef
subclass: undef.deferred
11.4 Final Delivery Failure
Signals:
Could not deliver after retry period
Final failure
Giving up
Normalized event:
notification.endpoint.rejected_permanent
Assessment:
category: fail
subclass: fail.expired_without_delivery
11.5 Out-of-Office Reply
Signals:
Auto-reply
Out of office
Vacation
Abwesenheitsnotiz
Ich bin nicht im Büro
Normalized event:
interaction.out_of_office_received
Assessment:
category: undef
subclass: undef.out_of_office
11.6 Human Reply
Signals:
Non-automated reply
No bounce markers
No auto-reply markers
Human-written body likely
Normalized event:
interaction.reply_received
Assessment:
category: success
subclass: success.reply_received
Note: this is email-channel success, not necessarily coordination success.
11.7 Complaint / Abuse
Signals:
Abuse report
Spam complaint
Feedback loop
Complaint notification
Normalized event:
notification.channel.complaint_received
Assessment:
category: fail
subclass: fail.complaint_received
11.8 Unsubscribe / Opt-Out
Signals:
Unsubscribe request
Opt-out
Remove me
STOP
Normalized event:
notification.channel.unsubscribe_received
Assessment:
category: fail
subclass: fail.unsubscribed
11.9 Unknown Return Message
Signals:
Message appears related to return mailbox
but no reliable classification is possible
Normalized event:
notification.endpoint.unknown
Assessment:
category: undef
subclass: undef.conflicting_evidence or undef.no_signal
12. CSV Report Format
12.1 Required Columns
The timestamped CSV report shall include:
report_generated_at
scan_id
mailbox_id
mailbox_message_id
mailbox_received_at
source_from
source_to
source_subject
message_id_header
detected_message_class
normalized_event_type
assessment_category
assessment_subclass
affected_email_address
original_message_id
original_recipient
smtp_status_code
enhanced_status_code
reason_code
confidence
evidence_strength
occurred_at
observed_at
first_seen_at
last_seen_at
deduplication_key
raw_message_ref
notes
12.2 Filename Convention
email-channel-evidence-report-YYYYMMDD-HHMMSS.csv
Example:
email-channel-evidence-report-20260602-173000.csv
12.3 Optional Secondary Reports
The MVP may also generate:
email-channel-summary-report-YYYYMMDD-HHMMSS.csv
email-endpoint-quality-report-YYYYMMDD-HHMMSS.csv
email-parse-failures-YYYYMMDD-HHMMSS.csv
13. Deduplication Strategy
13.1 Message Deduplication
Message deduplication should use:
mailbox_id
imap_uid
message_id_header
received_at
from_address
subject hash
body hash if available
13.2 Evidence Deduplication
Evidence deduplication should use:
mailbox_message_id
parser_version
normalized_event_type
affected_email_address
original_message_id
smtp_status_code
enhanced_status_code
reason_code
13.3 Rescan Behavior
A rescan should not duplicate existing evidence.
Rescans should support:
same parser version → preserve previous result unless raw message changed
new parser version → create new parse result version
report-only-new → export only newly discovered evidence
full report → export all current evidence
14. Storage Plan
For MVP, use local SQLite.
Suggested tables:
mailbox_scans
mailbox_messages
parsed_messages
evidence_candidates
endpoint_quality
scan_cursors
raw_event_refs
This keeps the MVP simple while preserving a path to a server database later.
15. Parser Implementation Plan
15.1 Parser Pipeline
load raw message
→ parse headers
→ parse MIME parts
→ identify DSN/report parts
→ extract original recipient
→ extract final recipient
→ extract original message id
→ extract SMTP/enhanced status codes
→ classify message
→ create parsed message record
→ create evidence candidate
15.2 Parser Layers
Implement parsers in layers:
HeaderParser
MimeParser
DsnParser
BounceHeuristicParser
AutoReplyParser
ComplaintParser
UnsubscribeParser
HumanReplyHeuristicParser
EvidenceMapper
15.3 Parser Versioning
Every parse result shall include:
parser_version
This allows rescanning old messages after parser improvements.
16. Evidence Mapping
Initial mappings:
| Parsed class | Normalized event | Assessment |
|---|---|---|
hard_bounce |
notification.endpoint.rejected_permanent |
fail.hard_bounce |
soft_bounce |
notification.endpoint.rejected_temporary |
undef.deferred |
delayed_delivery_notice |
notification.endpoint.deferred |
undef.deferred |
final_delivery_failure |
notification.endpoint.rejected_permanent |
fail.expired_without_delivery |
out_of_office |
interaction.out_of_office_received |
undef.out_of_office |
human_reply |
interaction.reply_received |
success.reply_received |
complaint_or_abuse |
notification.channel.complaint_received |
fail.complaint_received |
unsubscribe_or_opt_out |
notification.channel.unsubscribe_received |
fail.unsubscribed |
unknown_return_message |
notification.endpoint.unknown |
undef.conflicting_evidence |
parse_failed |
no event or diagnostic event | undef |
17. Endpoint Quality Updates
The scanner should update basic endpoint quality.
Examples:
| Evidence | Endpoint quality update |
|---|---|
| Hard bounce | reachability = unreachable, last_failure_at = now |
| Soft bounce | reachability = degraded, last_failure_at = now |
| Complaint | suppression_state = suppressed |
| Unsubscribe | suppression_state = opted_out |
| Human reply | last_success_at = now |
| Out of office | last_auto_reply_at = now, reachability = uncertain |
Endpoint quality is diagnostic and must not be treated as coordination success.
18. Work Packages
T01 - Repository Bootstrap
id: EMAIL-WP-0002-T01
status: done
priority: high
state_hub_task_id: "3a17215d-62a9-48ef-877f-a6fbc7e95a22"
Tasks:
Create repo structure
Add INTENT.md
Add or update spec/ProductRequirementsDocument.md
Add this MVP workplan
Set up basic build/test tooling
Add initial CLI entry point
Add config loading
Acceptance:
Project can run a placeholder CLI command.
Config file is loaded and validated.
T02 - Mailbox Connector
id: EMAIL-WP-0002-T02
status: done
priority: high
state_hub_task_id: "25a4da12-1bcd-4c6d-a0eb-a2f525b9c4b9"
Tasks:
Implement IMAP connection
Support TLS
Read credentials from environment variables
List/select configured folder
Fetch message metadata
Fetch full message source
Support max_messages_per_run
Support dry-run mode
Acceptance:
CLI can connect to mailbox and list/fetch messages without modifying mailbox.
T03 - Scan State and Storage
id: EMAIL-WP-0002-T03
status: done
priority: high
state_hub_task_id: "16b95a6b-1375-4c91-8b78-0b75d51e0aeb"
Tasks:
Add SQLite state store
Create schema
Store scan records
Store mailbox messages
Store scan cursor
Implement message deduplication
Implement incremental scan
Implement full rescan
Acceptance:
Running scanner twice does not duplicate mailbox messages.
Full rescan can revisit all messages while preserving deduplication.
T04 - MIME and Header Parsing
id: EMAIL-WP-0002-T04
status: done
priority: high
state_hub_task_id: "5a50cd85-b0ab-4017-aba0-b2087068abb4"
Tasks:
Parse RFC message headers
Parse Message-ID
Parse From/To/Subject/Date
Parse MIME body parts
Extract text/plain
Extract text/html fallback
Extract message/delivery-status parts where present
Store raw headers reference
Optionally store raw body reference
Acceptance:
Scanner extracts basic metadata and text from representative bounce and reply messages.
T05 - Bounce and DSN Parser
id: EMAIL-WP-0002-T05
status: done
priority: high
state_hub_task_id: "8ea826d1-0add-4573-9bb4-2b73adefba55"
Tasks:
Detect delivery-status notifications
Extract original recipient
Extract final recipient
Extract action
Extract status
Extract diagnostic code
Extract remote MTA if present
Classify hard vs soft bounce
Map SMTP/enhanced status codes
Acceptance:
Representative hard and soft bounce samples are classified correctly.
T06 - Auto-Reply and Human Reply Classifier
id: EMAIL-WP-0002-T06
status: done
priority: high
state_hub_task_id: "4d94a332-173b-4787-8fb2-27aa63db6a8d"
Tasks:
Detect out-of-office patterns
Detect auto-reply headers
Detect common German and English OOO phrases
Detect human reply fallback
Classify challenge-response as separate class if possible
Acceptance:
Representative OOO and human reply samples are classified with confidence.
T07 - Complaint and Unsubscribe Classifier
id: EMAIL-WP-0002-T07
status: done
priority: high
state_hub_task_id: "8637d383-25f7-45b5-9680-427ed2ca87bf"
Tasks:
Detect abuse/complaint messages
Detect unsubscribe/opt-out requests
Map to channel complaint/unsubscribe events
Create suppression candidates
Acceptance:
Representative complaint and unsubscribe examples are classified.
T08 - Evidence Candidate Generation
id: EMAIL-WP-0002-T08
status: done
priority: high
state_hub_task_id: "6d62dea0-f416-4c0b-80a0-7c16422b8e5f"
Tasks:
Map parsed classes to normalized event types
Generate EmailEvidenceCandidate records
Assign assessment category/subclass
Assign evidence strength
Assign confidence
Generate evidence deduplication key
Acceptance:
Parsed messages produce evidence candidates according to the mapping table.
T09 - Endpoint Quality Updates
id: EMAIL-WP-0002-T09
status: done
priority: medium
state_hub_task_id: "0d110877-953f-4aa2-961b-eec81e0159d4"
Tasks:
Create endpoint_quality table
Update endpoint quality from evidence
Track last failure/success
Track suppression state
Track reason code history
Acceptance:
Hard bounce updates endpoint quality to unreachable.
Complaint/unsubscribe updates suppression state.
T10 - CSV Report Generator
id: EMAIL-WP-0002-T10
status: done
priority: medium
state_hub_task_id: "5ab35176-d6c2-4c73-b7b3-bde4c097e3ee"
Tasks:
Generate timestamped CSV report
Include required columns
Support report-only-new
Support full evidence report
Support parse failure report
Write deterministic headers
Acceptance:
Running scanner creates a CSV report with evidence rows.
Report can be opened in spreadsheet tools.
T11 - Golden Test Corpus
id: EMAIL-WP-0002-T11
status: done
priority: high
state_hub_task_id: "514fa099-781b-4590-aae4-c28970413b3f"
Tasks:
Create test fixture directory
Add synthetic hard bounce
Add synthetic soft bounce
Add delayed delivery notice
Add final failure
Add out-of-office
Add human reply
Add unsubscribe
Add unknown return message
Add parse failure sample
Acceptance:
Automated tests verify expected classification and normalized event output.
T12 - Minimal Documentation
id: EMAIL-WP-0002-T12
status: done
priority: medium
state_hub_task_id: "a5f7067e-87be-4438-ba35-b12d06a8181e"
Tasks:
Add README quickstart
Document config file
Document CLI commands
Document CSV format
Document evidence mapping
Document limitations
Acceptance:
A developer can run the scanner against a test mailbox or fixture directory.
19. MVP Milestones
Milestone 1: Scan and Store
Goal:
Connect to mailbox, fetch messages, store metadata, deduplicate.
Includes:
T01
T02
T03
Milestone 2: Parse and Classify
Goal:
Parse messages and classify bounces, OOO, replies, complaints, unsubscribe.
Includes:
T04
T05
T06
T07
Milestone 3: Evidence and Reports
Goal:
Generate normalized evidence candidates and timestamped CSV reports.
Includes:
T08
T09
T10
Milestone 4: Confidence and Repeatability
Goal:
Add golden tests, parser versioning, and documentation.
Includes:
T11
T12
20. MVP Acceptance Criteria
The MVP is complete when:
- A user can configure access to one IMAP mailbox.
- The scanner can run without modifying mailbox contents.
- The scanner can perform incremental scans.
- The scanner can perform full rescans.
- Already-seen messages are deduplicated.
- Hard bounces are classified.
- Soft bounces are classified.
- Delayed delivery notices are classified.
- Out-of-office replies are classified.
- Human replies are classified.
- Complaints or unsubscribe messages are classified where detectable.
- Unknown messages are preserved as unknown rather than ignored silently.
- Evidence candidates are generated.
- Endpoint quality is updated.
- A timestamped CSV report is produced.
- Golden tests cover representative sample messages.
- The report format aligns with the
email-connectevidence model. - The implementation does not overclaim email evidence.
21. Design Rules
21.1 Do Not Overclaim
The scanner must not infer more than the mailbox evidence supports.
Examples:
No bounce found ≠ delivery success
Out-of-office ≠ recipient completed action
Human reply ≠ legally valid acceptance
Unknown message ≠ failure
21.2 Preserve Unknowns
Unknown and parse-failed messages should be visible in reports.
21.3 Prefer Evidence Over Status
The scanner should produce evidence rows, not only final statuses.
21.4 Make Rescans Safe
Rescans should be safe, deduplicated, and parser-version-aware.
21.5 Keep Raw References
Store enough raw reference data to allow later inspection.
22. Suggested Initial Repository Structure
email-connect/
INTENT.md
spec/
ProductRequirementsDocument.md
workplans/
EMAIL-WP-0002-mvp-mailbox-evidence-scanner.md
docs/
EmailAdapterSpecification.md
src/
email_connect/
cli/
config/
mailbox/
parsing/
evidence/
reporting/
storage/
tests/
fixtures/
hard_bounce/
soft_bounce/
delayed_delivery/
out_of_office/
human_reply/
unsubscribe/
unknown/
test_mailbox_scanner.py
test_bounce_parser.py
test_evidence_mapping.py
config/
mailbox.example.yml
reports/
.gitkeep
23. Future Extensions After MVP
Possible next steps:
Provider webhook ingestion
Outbound send API
Template manager
Minimal UI
Multi-mailbox scanning
OAuth mailbox access
Mailbox write-back actions
Advanced DSN parsing
Advanced German/English auto-reply classification
Natural-language reply intent extraction
Suppression export
coordination-engine evidence event push
Endpoint quality dashboard
Provider-specific bounce mappings
24. Summary
The Mailbox Evidence Scanner MVP is a strong first implementation slice for email-connect.
It delivers immediate practical value by converting a return mailbox into structured email-channel evidence reports. It also validates the core email-connect evidence model before implementing outbound sending, provider webhooks, or full coordination-engine integration.
The guiding rule is:
Scan the mailbox, preserve the evidence, classify conservatively, report clearly, and never treat missing evidence as delivery success.