Compare commits

..

10 Commits

Author SHA1 Message Date
7cd7efe956 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for email-connect
2026-06-22 23:20:53 +02:00
208b4ae4cc Normalize agent instructions and workplan frontmatter (STATE-WP-0067)
- Align agent files with on-disk workplan prefixes (infer from workplan ids)
- Set workplan domain to registered domain_slug; add topic_slug where applicable
- Repair frontmatter delimiter formatting; migrate legacy task status literals
- Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates
2026-06-22 23:16:24 +02:00
bfb1034132 Mark .repo-classification.yaml human-reviewed (CUST-WP-0050 T02)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 11:40:44 +02:00
55c3a5417d Reclassify as tooling (CUST-WP-0050 T02)
Apply the new 'tooling' category (reusable internal tooling/infrastructure)
from the Repo Classification Standard. First-pass agent classification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 03:06:02 +02:00
dcda76eff2 Add repo classification (CUST-WP-0050 T02)
First-pass agent classification per the Repo Classification Standard v1.0
(canon-repo-classification); pending human review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 02:44:47 +02:00
5a13a12edf Add credential routing instructions for all agent runtimes
Propagate shared credential-routing section (Codex, Claude, Grok, llm-connect)
from state-hub template via scripts/propagate_credential_routing.py.
2026-06-18 22:48:37 +02:00
334a01d8e3 Add capability registry scaffold (REUSE-WP-0014-T04 B02)
Empty helix_forge registry layout for federation publishing.
2026-06-16 01:52:17 +02:00
b7591f531b feat: add expected recipient reporting 2026-06-02 03:07:13 +02:00
5ea6c738d2 docs: add expected recipient reporting workplan 2026-06-02 02:45:33 +02:00
7a9686f53a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-02:
  - update .custodian-brief.md for email-connect
2026-06-02 02:43:38 +02:00
35 changed files with 1372 additions and 89 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=email-connect` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/EMAIL-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured infotech into N workstreams, M tasks",
event_type="milestone",
topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **email-connect** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Headless provider-neutral email communication and evidence service for sending, tracking, diagnosing, and normalizing email-channel events.
**Domain:** infotech
**Repo slug:** email-connect
**Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("infotech")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="email-connect", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=email-connect&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:email-connect]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"cee7bedf-2b48-46ef-8601-006474f2ad7a","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=email-connect
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=email-connect
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,19 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Run tests
# Lint / type check
# Build / package (if applicable)
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/EMAIL-WP-NNNN-<slug>.md`
ID prefix: `EMAIL-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-EMAIL-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:email-connect]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: EMAIL-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -1,37 +1,18 @@
<!-- custodian-brief: generated by fix-consistency — do not edit manually -->
# Custodian Brief — email-connect
**Domain:** custodian
**Last synced:** 2026-06-01 22:46 UTC
**Domain:** infotech
**Last synced:** 2026-06-22 21:20 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams
### MVP Mailbox Evidence Scanner
Progress: 0/12 done | workstream_id: `c81788aa-0d0a-4493-bf41-ab6cc2068f2f`
**Open tasks:**
- · T01 - Repository Bootstrap `3a17215d`
- · T02 - Mailbox Connector `25a4da12`
- · T03 - Scan State and Storage `16b95a6b`
- · T04 - MIME and Header Parsing `5a50cd85`
- · T05 - Bounce and DSN Parser `8ea826d1`
- · T06 - Auto-Reply and Human Reply Classifier `4d94a332`
- · T07 - Complaint and Unsubscribe Classifier `8637d383`
- … and 5 more open tasks
### Repository Onboarding and Implementation Foundation
Progress: 1/4 done | workstream_id: `4533ceb6-bd86-49ee-a014-cffd68f84fbb`
**Open tasks:**
- · T02 - Define The Initial Runtime Architecture `fdfd8b96`
- · T03 - Model The Email Evidence Canon `ef1eb769`
- · T04 - Create The First Service Skeleton `4b94e544`
*(none — repo may need first-session setup)*
---
## MCP Orientation (when available)
If the state-hub MCP server is reachable, call:
`get_domain_summary("custodian")`
`get_domain_summary("infotech")`
This provides richer cross-domain context.
If the MCP call fails, use this file as your orientation source.

25
.repo-classification.yaml Normal file
View File

@@ -0,0 +1,25 @@
# Repo classification (Repo Classification Standard v1.0).
repo_classification:
standard: Repo Classification Standard
version: '1.0'
classified_at: '2026-06-22'
classified_by: human
category: tooling
domain: infotech
secondary_domains:
- communication
capability_tags:
- evidence
- traceability
- source-management
- automation
business_stake:
- technology
- operations
- legal
business_mechanics:
- operation
- coordination
notes: Headless, provider-neutral email communication and evidence service. Headless infra
for developers -> domain infotech; communication is a secondary market.

View File

@@ -2,41 +2,15 @@
## Repo Identity
**Purpose:** Headless, provider-neutral email communication and evidence service
for sending, tracking, diagnosing, and normalizing email-channel events without
overclaiming delivery, awareness, or result satisfaction.
**Purpose:** Headless provider-neutral email communication and evidence service for sending, tracking, diagnosing, and normalizing email-channel events.
**Domain:** custodian
**Domain:** infotech
**Repo slug:** email-connect
**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
**Workplan prefix:** `EMAIL-WP-`
---
## Repo Orientation
This repository is currently in planning/specification shape. Treat
`INTENT.md` as the canonical product intent and
`spec/ProductRequirementsDocument.md` as the detailed requirements source until
implementation architecture is added.
Preserve the core principle: email events are evidence, not result
satisfaction. Provider acceptance, MX acceptance, inbox placement, opens,
clicks, replies, complaints, suppressions, and bounces must remain distinct.
Do not collapse them into "delivered", "read", "accepted", or any similar
business conclusion.
Keep the product provider-neutral and adapter-friendly. It should be useful as
a standalone email communication/evidence service and as the reference
`coordination-engine` adapter. It should not drift into newsletter campaign
management, broad marketing automation, or legal/business outcome adjudication.
There is no runtime stack committed yet. Any stack, API, storage, or provider
choice should be introduced through a workplan and anchored in the current
intent/PRD.
---
## State Hub Integration
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
@@ -50,8 +24,8 @@ there is no MCP server for Codex agents.
### Orient at session start
```bash
# Offline brief, when present
test -f .custodian-brief.md && cat .custodian-brief.md
# Offline brief — works without hub connection
cat .custodian-brief.md
# Active workstreams for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=cee7bedf-2b48-46ef-8601-006474f2ad7a&status=active" \
@@ -106,7 +80,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
## Session Protocol
**Start:**
1. `test -f .custodian-brief.md && cat .custodian-brief.md` — domain goal and open workstreams, when present
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
2. Check inbox: `GET /messages/?to_agent=email-connect&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
@@ -127,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=email-connect` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
@@ -150,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
id: EMAIL-WP-NNNN
type: workplan
title: "..."
domain: custodian
domain: infotech
repo: email-connect
status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex

View File

@@ -1,20 +1,12 @@
# email-connect - Claude Code Instructions
# email-connect Claude Code Instructions
This repository uses `AGENTS.md` as the canonical local agent instruction file.
Claude Code agents should read `AGENTS.md` first and follow the same State Hub
identity, workplan convention, and repo-specific semantic guardrails.
State Hub identity:
- Domain: `custodian`
- Repo slug: `email-connect`
- Topic ID: `cee7bedf-2b48-46ef-8601-006474f2ad7a`
- Workplan prefix: `EMAIL-WP-`
Core product rule: email events are evidence, not result satisfaction. Do not
treat provider acceptance, MX acceptance, inbox placement, opens, clicks,
replies, complaints, or suppressions as proof of human awareness or business
outcome completion.
Start with `INTENT.md`, then `spec/ProductRequirementsDocument.md`, then the
active files under `workplans/`.
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

View File

@@ -13,12 +13,18 @@ overclaiming delivery, awareness, or coordination success.
PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --out reports/
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox --config config/mailbox.example.yml --expected-recipients tests/fixtures/expected_recipients.txt --out reports/
```
The example config uses `tests/fixtures/mailbox` as a mailbox source. Runtime
state is written to `.email-connect/state.sqlite`; generated CSV reports are
written to `reports/`.
Expected recipients are optional. When provided as a text file or CSV, reports
include a `known_recipient` column, place known recipients first, and add
`undef.no_signal` diagnostics for expected recipients with no mailbox evidence.
See [Mailbox Report Tutorial](docs/mailbox-report-tutorial.md).
For a live mailbox, set `mailbox.protocol: imap`, configure host, port, folder,
and credential environment variable names, then export the credentials before
running `scan-mailbox`. IMAP scans select the configured folder read-only and
@@ -35,6 +41,8 @@ marking messages seen are intentionally unsupported in this MVP.
- SQLite state store with scan cursor, message/evidence deduplication, and
endpoint quality hints.
- CSV report generation, including `--report-only-new`.
- Optional expected-recipient text/CSV input, `known_recipient` report
filtering, no-evidence diagnostics, and datetime range filtering.
- Golden fixture tests for hard bounce, soft bounce, delayed delivery, final
failure, complaint, unsubscribe, challenge-response, unknown return,
parse-failure, out-of-office, and human reply signals.

View File

@@ -15,12 +15,18 @@ scan:
mode: incremental
max_messages_per_run: 5000
since: null
from: null
to: null
include_seen: true
mark_seen: false
store_raw_headers: true
store_raw_body: false
store_raw_message_ref: true
expected_recipients:
path: null
csv_column: email
storage:
path: .email-connect/state.sqlite

View File

@@ -37,6 +37,7 @@ coordination runtime decides whether those facts satisfy a coordination case.
| `unknown_return_message` | `notification.endpoint.unknown` | `undef.conflicting_evidence` |
| `challenge_response` | `interaction.unverified_actor_interaction` | `undef.identity_uncertain` |
| `parse_failed` | `diagnostic.message.parse_failed` | `undef.parse_failed` |
| expected recipient with no evidence | `diagnostic.expected_recipient.no_evidence` | `undef.no_signal` |
## Overclaim Prevention
@@ -47,6 +48,8 @@ coordination runtime decides whether those facts satisfy a coordination case.
- Human reply does not prove legal acceptance.
- Unknown return messages remain visible.
- Parse failures are diagnostic rows, not delivery or interaction outcomes.
- Expected-recipient no-evidence rows mean no mailbox evidence was found in the
inspected range, not that notification succeeded or failed.
- Scanner and proxy interactions must stay below identity-bound interaction.
## Endpoint Quality Hints

View File

@@ -0,0 +1,139 @@
# Mailbox Report Tutorial
This tutorial shows how to generate an email-channel evidence report from a
return mailbox or from the bundled fixture mailbox.
## 1. Verify The Scanner
Run the tests and print the adapter descriptor:
```bash
PYTHONPATH=src python3 -m unittest discover -s tests
PYTHONPATH=src python3 -m email_connect.cli adapter-descriptor
```
## 2. Start With Fixtures
The example config uses `tests/fixtures/mailbox`:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--full-rescan \
--out reports/
```
The scanner writes a timestamped CSV file to `reports/`.
## 3. Configure A Live IMAP Mailbox
Copy `config/mailbox.example.yml` and set:
```yaml
mailbox:
protocol: imap
host: imap.example.com
port: 993
tls: true
username_env: EMAIL_CONNECT_IMAP_USER
password_env: EMAIL_CONNECT_IMAP_PASSWORD
folder: INBOX
```
Then export credentials:
```bash
export EMAIL_CONNECT_IMAP_USER='mailbox@example.com'
export EMAIL_CONNECT_IMAP_PASSWORD='app-password'
```
IMAP scans select the folder read-only and fetch messages with `BODY.PEEK[]`.
The scanner does not mark messages seen, move messages, or delete messages.
## 4. Add Expected Recipients
Expected recipients are optional. A newline-separated file can look like:
```text
missing@example.com
recipient@example.com
```
Run:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--expected-recipients recipients.txt \
--out reports/
```
CSV input is also supported:
```csv
email,name
missing@example.com,Missing User
recipient@example.com,Known Recipient
```
Run:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--expected-recipients recipients.csv \
--expected-recipient-column email \
--out reports/
```
Invalid recipient rows are ignored and printed as warnings.
## 5. Limit The Time Range
Use an inclusive datetime range:
```bash
PYTHONPATH=src python3 -m email_connect.cli scan-mailbox \
--config config/mailbox.example.yml \
--from 2026-06-02T10:00:00Z \
--to 2026-06-02T11:00:00Z \
--out reports/
```
`--since` remains a compatibility alias for the lower bound. When a range is
active, messages with no parseable `Date` header are excluded because the
scanner cannot confirm that they originated inside the requested window.
## 6. Read The Report
Key columns:
- `known_recipient`: `true` when the address was supplied in the expected list.
- `normalized_event_type`: the email evidence or diagnostic event.
- `assessment_category` and `assessment_subclass`: advisory interpretation.
- `affected_email_address`: the endpoint the row is about.
Known recipients appear first by default so spreadsheet filtering is easy.
Expected recipients with no mailbox evidence appear as:
```text
normalized_event_type: diagnostic.expected_recipient.no_evidence
assessment_category: undef
assessment_subclass: undef.no_signal
evidence_strength: none
known_recipient: true
```
That row means only that no mailbox evidence was found for the supplied address
inside the inspected range. It is not evidence of delivery success, delivery
failure, recipient awareness, or legal acceptance.
## 7. Troubleshooting
- Empty report: check folder, time range, and whether incremental cursor state
already skipped older messages. Try `--full-rescan`.
- IMAP credential error: verify the environment variable names and values.
- Missing expected rows: check the recipient file path and CSV column name.
- Unexpected no-evidence rows: confirm that the relevant mailbox evidence is
inside the configured datetime range.

12
registry/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.

View File

View File

@@ -0,0 +1,4 @@
version: 1
updated: '2026-06-16'
domain: helix_forge
capabilities: []

View File

@@ -13,6 +13,7 @@ EMITTED_EVENT_TYPES = [
"interaction.out_of_office_received",
"notification.endpoint.unknown",
"diagnostic.message.parse_failed",
"diagnostic.expected_recipient.no_evidence",
]

View File

@@ -18,6 +18,10 @@ def main(argv: list[str] | None = None) -> int:
scan.add_argument("--out", default=None)
scan.add_argument("--full-rescan", action="store_true")
scan.add_argument("--since", default=None)
scan.add_argument("--from", dest="range_from", default=None)
scan.add_argument("--to", dest="range_to", default=None)
scan.add_argument("--expected-recipients", default=None)
scan.add_argument("--expected-recipient-column", default=None)
scan.add_argument("--report-only-new", action="store_true")
scan.add_argument("--dry-run", action="store_true")
scan.add_argument("--fixture-dir", default=None)
@@ -39,6 +43,10 @@ def main(argv: list[str] | None = None) -> int:
dry_run=args.dry_run,
fixture_dir=args.fixture_dir,
since=args.since,
range_from=args.range_from,
range_to=args.range_to,
expected_recipients_path=args.expected_recipients,
expected_recipient_column=args.expected_recipient_column,
)
print(f"scan_id={result.scan.scan_id}")
print(f"messages_seen={result.scan.messages_seen}")
@@ -47,6 +55,8 @@ def main(argv: list[str] | None = None) -> int:
print(f"evidence_events_created={result.scan.evidence_events_created}")
if result.report_path:
print(f"report_path={Path(result.report_path)}")
for warning in result.warnings:
print(f"warning={warning}")
return 0
return 2

View File

@@ -22,6 +22,8 @@ class ScanConfig:
mode: str = "incremental"
max_messages_per_run: int = 5000
since: str | None = None
range_from: str | None = None
range_to: str | None = None
include_seen: bool = True
mark_seen: bool = False
store_raw_headers: bool = True
@@ -47,6 +49,12 @@ class SourceConfig:
fixture_dir: str | None = None
@dataclass(frozen=True)
class ExpectedRecipientsConfig:
path: str | None = None
csv_column: str = "email"
@dataclass(frozen=True)
class AppConfig:
mailbox: MailboxConfig
@@ -54,6 +62,7 @@ class AppConfig:
storage: StorageConfig
reports: ReportsConfig
source: SourceConfig = SourceConfig()
expected_recipients: ExpectedRecipientsConfig = ExpectedRecipientsConfig()
def load_config(path: str | Path) -> AppConfig:
@@ -63,6 +72,7 @@ def load_config(path: str | Path) -> AppConfig:
storage = data.get("storage", {})
reports = data.get("reports", {})
source = data.get("source", {})
expected_recipients = data.get("expected_recipients", {})
return AppConfig(
mailbox=MailboxConfig(
id=str(mailbox.get("id", "return-mailbox-default")),
@@ -78,6 +88,8 @@ def load_config(path: str | Path) -> AppConfig:
mode=str(scan.get("mode", "incremental")),
max_messages_per_run=int(scan.get("max_messages_per_run", 5000)),
since=scan.get("since"),
range_from=scan.get("from") or scan.get("range_from"),
range_to=scan.get("to") or scan.get("range_to"),
include_seen=bool(scan.get("include_seen", True)),
mark_seen=bool(scan.get("mark_seen", False)),
store_raw_headers=bool(scan.get("store_raw_headers", True)),
@@ -92,6 +104,10 @@ def load_config(path: str | Path) -> AppConfig:
timestamp_timezone=str(reports.get("timestamp_timezone", "UTC")),
),
source=SourceConfig(fixture_dir=source.get("fixture_dir")),
expected_recipients=ExpectedRecipientsConfig(
path=expected_recipients.get("path"),
csv_column=str(expected_recipients.get("csv_column", "email")),
),
)

View File

@@ -57,6 +57,8 @@ class MailboxScan:
evidence_events_created: int = 0
report_path: str | None = None
since: datetime | None = None
range_start: datetime | None = None
range_end: datetime | None = None
@dataclass(frozen=True)

View File

@@ -0,0 +1,79 @@
from __future__ import annotations
import csv
import re
from dataclasses import dataclass, field
from pathlib import Path
EMAIL_RE = re.compile(r"^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$", re.IGNORECASE)
@dataclass(frozen=True)
class ExpectedRecipients:
addresses: tuple[str, ...] = ()
invalid_entries: tuple[str, ...] = ()
def load_expected_recipients(
path: str | Path | None,
*,
csv_column: str | None = "email",
) -> ExpectedRecipients:
if not path:
return ExpectedRecipients()
recipient_path = Path(path)
if recipient_path.suffix.lower() == ".csv":
return _load_csv_recipients(recipient_path, csv_column=csv_column or "email")
return _load_line_recipients(recipient_path)
def normalize_email_address(value: str | None) -> str | None:
if value is None:
return None
normalized = value.strip().lower()
if not normalized:
return None
return normalized if EMAIL_RE.fullmatch(normalized) else None
@dataclass
class _RecipientCollector:
addresses: dict[str, None] = field(default_factory=dict)
invalid_entries: list[str] = field(default_factory=list)
def add(self, value: str | None, *, source: str) -> None:
normalized = normalize_email_address(value)
if normalized:
self.addresses[normalized] = None
return
if value and value.strip():
self.invalid_entries.append(f"{source}: {value.strip()}")
def result(self) -> ExpectedRecipients:
return ExpectedRecipients(
addresses=tuple(self.addresses.keys()),
invalid_entries=tuple(self.invalid_entries),
)
def _load_line_recipients(path: Path) -> ExpectedRecipients:
collector = _RecipientCollector()
for line_number, raw_line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
line = raw_line.strip()
if not line or line.startswith("#"):
continue
collector.add(line, source=f"{path}:{line_number}")
return collector.result()
def _load_csv_recipients(path: Path, *, csv_column: str) -> ExpectedRecipients:
collector = _RecipientCollector()
with path.open(newline="", encoding="utf-8") as fh:
reader = csv.DictReader(fh)
if reader.fieldnames is None:
return collector.result()
column = csv_column if csv_column in reader.fieldnames else reader.fieldnames[0]
for line_number, row in enumerate(reader, start=2):
collector.add(row.get(column), source=f"{path}:{line_number}:{column}")
return collector.result()

View File

@@ -20,6 +20,7 @@ REPORT_COLUMNS = [
"assessment_category",
"assessment_subclass",
"affected_email_address",
"known_recipient",
"original_message_id",
"original_recipient",
"smtp_status_code",
@@ -49,16 +50,17 @@ def write_evidence_report(
scan_id: str,
mailbox_id: str,
generated_at: datetime | None = None,
expected_recipients: set[str] | None = None,
) -> Path:
generated = generated_at or datetime.now(UTC)
out_dir = Path(output_dir)
out_dir.mkdir(parents=True, exist_ok=True)
path = out_dir / report_filename(generated)
path = _unique_report_path(out_dir / report_filename(generated))
with path.open("w", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=REPORT_COLUMNS)
writer.writeheader()
for row in rows:
for row in _ordered_rows(rows, expected_recipients=expected_recipients or set()):
writer.writerow(_report_row(row, scan_id=scan_id, mailbox_id=mailbox_id, generated_at=generated))
return path
@@ -66,6 +68,7 @@ def write_evidence_report(
def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datetime) -> dict:
metadata = _json(row.get("metadata_json"))
notes = _json(row.get("notes_json"))
known_recipient = _known_recipient(row, expected_recipients=set(row.get("_expected_recipients", [])))
return {
"report_generated_at": generated_at.isoformat(),
"scan_id": scan_id,
@@ -81,6 +84,7 @@ def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datet
"assessment_category": row.get("assessment_category", ""),
"assessment_subclass": row.get("assessment_subclass", ""),
"affected_email_address": row.get("affected_email_address") or "",
"known_recipient": "true" if known_recipient else "false",
"original_message_id": row.get("original_message_id") or "",
"original_recipient": metadata.get("original_recipient", ""),
"smtp_status_code": metadata.get("smtp_status_code") or "",
@@ -98,6 +102,29 @@ def _report_row(row: dict, *, scan_id: str, mailbox_id: str, generated_at: datet
}
def _ordered_rows(rows: list[dict], *, expected_recipients: set[str]) -> list[dict]:
enriched = [dict(row, _expected_recipients=tuple(expected_recipients)) for row in rows]
if not expected_recipients:
return enriched
return sorted(
enriched,
key=lambda row: (
not _known_recipient(row, expected_recipients=expected_recipients),
str(row.get("affected_email_address") or ""),
str(row.get("observed_at") or ""),
str(row.get("event_type") or ""),
str(row.get("deduplication_key") or ""),
),
)
def _known_recipient(row: dict, *, expected_recipients: set[str]) -> bool:
if row.get("known_recipient") is True:
return True
address = str(row.get("affected_email_address") or "").lower()
return bool(address and address in expected_recipients)
def _json(value: str | None) -> dict | list:
if not value:
return {}
@@ -105,3 +132,15 @@ def _json(value: str | None) -> dict | list:
return json.loads(value)
except json.JSONDecodeError:
return {}
def _unique_report_path(path: Path) -> Path:
if not path.exists():
return path
stem = path.stem
suffix = path.suffix
for index in range(1, 1000):
candidate = path.with_name(f"{stem}-{index:02d}{suffix}")
if not candidate.exists():
return candidate
raise RuntimeError(f"Could not allocate unique report filename for {path}")

View File

@@ -1,7 +1,9 @@
from __future__ import annotations
import json
from dataclasses import dataclass
from datetime import UTC, datetime
from pathlib import Path
from uuid import uuid4
from .config import AppConfig
@@ -9,6 +11,7 @@ from .evidence import endpoint_quality_from_candidate
from .mailbox import source_for_config
from .models import MailboxScan
from .parser import parse_message_bytes
from .recipients import load_expected_recipients
from .reporting import write_evidence_report
from .storage import StateStore
@@ -17,6 +20,7 @@ from .storage import StateStore
class ScanResult:
scan: MailboxScan
report_path: Path | None
warnings: tuple[str, ...] = ()
def scan_mailbox(
@@ -28,10 +32,24 @@ def scan_mailbox(
dry_run: bool = False,
fixture_dir: str | None = None,
since: str | None = None,
range_from: str | None = None,
range_to: str | None = None,
expected_recipients_path: str | None = None,
expected_recipient_column: str | None = None,
) -> ScanResult:
started_at = datetime.now(UTC)
scan_id = str(uuid4())
since_at = _parse_since(since or config.scan.since)
range_start = _parse_datetime(range_from or since or config.scan.range_from or config.scan.since)
range_end = _parse_datetime(range_to or config.scan.range_to)
if range_start and range_end and range_start > range_end:
raise ValueError("scan datetime range lower bound must be before or equal to upper bound.")
since_at = range_start
expected = load_expected_recipients(
expected_recipients_path or config.expected_recipients.path,
csv_column=expected_recipient_column or config.expected_recipients.csv_column,
)
expected_addresses = set(expected.addresses)
warnings = tuple(f"invalid expected recipient ignored: {entry}" for entry in expected.invalid_entries)
source = source_for_config(config, fixture_dir_override=fixture_dir)
store = StateStore(config.storage.path)
@@ -58,7 +76,7 @@ def scan_mailbox(
raw_message_ref=message.raw_message_ref,
imap_uid=message.imap_uid,
)
if since_at and inbound.received_at and inbound.received_at < since_at:
if not _in_range(inbound.received_at, range_start=range_start, range_end=range_end):
continue
if dry_run:
messages_parsed += 1
@@ -91,14 +109,29 @@ def scan_mailbox(
report_path = None
if not dry_run:
range_evidence_rows = store.evidence_rows(range_start=range_start, range_end=range_end)
report_rows = store.evidence_rows(
deduplication_keys=new_evidence_keys if report_only_new else None,
range_start=range_start,
range_end=range_end,
)
report_rows = [
*report_rows,
*_no_evidence_rows(
mailbox_id=config.mailbox.id,
expected_addresses=expected_addresses,
evidence_rows=range_evidence_rows,
observed_at=datetime.now(UTC),
range_start=range_start,
range_end=range_end,
),
]
report_path = write_evidence_report(
report_rows,
output_dir=output_dir or config.reports.output_dir,
scan_id=scan_id,
mailbox_id=config.mailbox.id,
expected_recipients=expected_addresses,
)
finished_at = datetime.now(UTC)
scan = MailboxScan(
@@ -115,10 +148,12 @@ def scan_mailbox(
evidence_events_created=evidence_created,
report_path=str(report_path) if report_path else None,
since=since_at,
range_start=range_start,
range_end=range_end,
)
if not dry_run:
store.insert_scan(scan)
return ScanResult(scan=scan, report_path=report_path)
return ScanResult(scan=scan, report_path=report_path, warnings=warnings)
finally:
store.close()
@@ -142,7 +177,7 @@ def _enrich_candidate(candidate, inbound, parsed):
)
def _parse_since(value: str | None) -> datetime | None:
def _parse_datetime(value: str | None) -> datetime | None:
if not value:
return None
normalized = value.strip()
@@ -154,3 +189,81 @@ def _parse_since(value: str | None) -> datetime | None:
if parsed.tzinfo is None:
return parsed.replace(tzinfo=UTC)
return parsed.astimezone(UTC)
def _in_range(
received_at: datetime | None,
*,
range_start: datetime | None,
range_end: datetime | None,
) -> bool:
if range_start is None and range_end is None:
return True
if received_at is None:
return False
if range_start is not None and received_at < range_start:
return False
if range_end is not None and received_at > range_end:
return False
return True
def _no_evidence_rows(
*,
mailbox_id: str,
expected_addresses: set[str],
evidence_rows: list[dict],
observed_at: datetime,
range_start: datetime | None,
range_end: datetime | None,
) -> list[dict]:
if not expected_addresses:
return []
known_evidence_addresses = {
str(row.get("affected_email_address") or "").lower()
for row in evidence_rows
if row.get("affected_email_address")
}
rows = []
for address in sorted(expected_addresses - known_evidence_addresses):
rows.append(_no_evidence_row(mailbox_id, address, observed_at, range_start=range_start, range_end=range_end))
return rows
def _no_evidence_row(
mailbox_id: str,
address: str,
observed_at: datetime,
*,
range_start: datetime | None,
range_end: datetime | None,
) -> dict:
range_key = "|".join([
range_start.isoformat() if range_start else "",
range_end.isoformat() if range_end else "",
])
return {
"mailbox_message_id": "",
"event_type": "diagnostic.expected_recipient.no_evidence",
"assessment_category": "undef",
"assessment_subclass": "undef.no_signal",
"affected_email_address": address,
"original_message_id": "",
"confidence": "high",
"evidence_strength": "none",
"occurred_at": "",
"observed_at": observed_at.isoformat(),
"deduplication_key": f"{mailbox_id}|expected_recipient|no_evidence|{address}|{range_key}",
"raw_message_ref": "",
"notes_json": json.dumps([
"Expected recipient was supplied by the operator; no mailbox evidence was found in the inspected range.",
"This is not evidence of delivery success or delivery failure.",
]),
"metadata_json": json.dumps({
"message_class": "expected_recipient_no_evidence",
"original_recipient": address,
"range_start": range_start.isoformat() if range_start else None,
"range_end": range_end.isoformat() if range_end else None,
}),
"known_recipient": True,
}

View File

@@ -42,7 +42,9 @@ class StateStore:
messages_parsed integer not null,
evidence_events_created integer not null,
report_path text,
since text
since text,
range_start text,
range_end text
);
create table if not exists mailbox_messages (
@@ -121,8 +123,18 @@ class StateStore:
);
"""
)
self._ensure_column("mailbox_scans", "range_start", "text")
self._ensure_column("mailbox_scans", "range_end", "text")
self.conn.commit()
def _ensure_column(self, table: str, column: str, column_type: str) -> None:
columns = {
str(row["name"])
for row in self.conn.execute(f"pragma table_info({table})").fetchall()
}
if column not in columns:
self.conn.execute(f"alter table {table} add column {column} {column_type}")
def upsert_message(self, message: InboundMailboxMessage) -> bool:
existing = self.conn.execute(
"select mailbox_message_id from mailbox_messages where deduplication_key = ?",
@@ -214,7 +226,7 @@ class StateStore:
def insert_scan(self, scan: MailboxScan) -> None:
self.conn.execute(
"""
insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
insert or replace into mailbox_scans values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
scan.scan_id,
@@ -230,6 +242,8 @@ class StateStore:
scan.evidence_events_created,
scan.report_path,
_dt(scan.since),
_dt(scan.range_start),
_dt(scan.range_end),
),
)
self.conn.commit()
@@ -304,7 +318,13 @@ class StateStore:
).fetchall()
return [dict(row) for row in rows]
def evidence_rows(self, *, deduplication_keys: list[str] | None = None) -> list[dict]:
def evidence_rows(
self,
*,
deduplication_keys: list[str] | None = None,
range_start: datetime | None = None,
range_end: datetime | None = None,
) -> list[dict]:
if deduplication_keys is not None:
if not deduplication_keys:
return []
@@ -317,9 +337,9 @@ class StateStore:
""",
deduplication_keys,
).fetchall()
return [dict(row) for row in rows]
return _filter_rows_by_range([dict(row) for row in rows], range_start=range_start, range_end=range_end)
rows = self.conn.execute("select * from evidence_candidates order by observed_at, event_type").fetchall()
return [dict(row) for row in rows]
return _filter_rows_by_range([dict(row) for row in rows], range_start=range_start, range_end=range_end)
def _dt(value: datetime | None) -> str | None:
@@ -330,6 +350,28 @@ def _parse_dt(value: str | None) -> datetime | None:
return datetime.fromisoformat(value) if value else None
def _filter_rows_by_range(
rows: list[dict],
*,
range_start: datetime | None,
range_end: datetime | None,
) -> list[dict]:
if range_start is None and range_end is None:
return rows
return [row for row in rows if _row_in_range(row, range_start=range_start, range_end=range_end)]
def _row_in_range(row: dict, *, range_start: datetime | None, range_end: datetime | None) -> bool:
occurred_at = _parse_dt(row.get("occurred_at"))
if occurred_at is None:
return False
if range_start is not None and occurred_at < range_start:
return False
if range_end is not None and occurred_at > range_end:
return False
return True
def _merge_endpoint_quality(
existing,
update: EndpointQualityUpdate,

View File

@@ -0,0 +1,5 @@
email,name
optout@example.com,Opt Out
csv-absent@example.com,Missing From Mailbox
OPTOut@example.com,Duplicate Case Variant
not-an-address,Invalid
1 email name
2 optout@example.com Opt Out
3 csv-absent@example.com Missing From Mailbox
4 OPTOut@example.com Duplicate Case Variant
5 not-an-address Invalid

View File

@@ -0,0 +1,4 @@
missing@example.com
absent@example.com
MISSING@example.com
not-an-address

31
tests/test_recipients.py Normal file
View File

@@ -0,0 +1,31 @@
from __future__ import annotations
import unittest
from pathlib import Path
from email_connect.recipients import load_expected_recipients, normalize_email_address
FIXTURES = Path(__file__).parent / "fixtures"
class RecipientTests(unittest.TestCase):
def test_normalizes_email_addresses(self) -> None:
self.assertEqual(normalize_email_address(" USER@Example.COM "), "user@example.com")
self.assertIsNone(normalize_email_address("not-an-address"))
def test_loads_line_separated_recipients(self) -> None:
recipients = load_expected_recipients(FIXTURES / "expected_recipients.txt")
self.assertEqual(recipients.addresses, ("missing@example.com", "absent@example.com"))
self.assertEqual(len(recipients.invalid_entries), 1)
def test_loads_csv_recipients(self) -> None:
recipients = load_expected_recipients(FIXTURES / "expected_recipients.csv", csv_column="email")
self.assertEqual(recipients.addresses, ("optout@example.com", "csv-absent@example.com"))
self.assertEqual(len(recipients.invalid_entries), 1)
if __name__ == "__main__":
unittest.main()

View File

@@ -11,6 +11,7 @@ from email_connect.storage import StateStore
FIXTURES = Path(__file__).parent / "fixtures" / "mailbox"
RECIPIENT_FIXTURES = Path(__file__).parent / "fixtures"
class ScannerTests(unittest.TestCase):
@@ -38,6 +39,10 @@ class ScannerTests(unittest.TestCase):
self.assertEqual(full.scan.messages_new, 0)
self.assertEqual(full.scan.evidence_events_created, 0)
self.assertTrue(first.report_path and first.report_path.exists())
with first.report_path.open(newline="", encoding="utf-8") as fh:
first_rows = list(DictReader(fh))
self.assertTrue(first_rows)
self.assertTrue(all(row["known_recipient"] == "false" for row in first_rows))
self.assertTrue(full.report_path and full.report_path.exists())
with full.report_path.open(newline="", encoding="utf-8") as fh:
self.assertEqual(list(DictReader(fh)), [])
@@ -65,6 +70,110 @@ class ScannerTests(unittest.TestCase):
self.assertEqual(rows["complained@example.com"]["suppression_state"], "suppressed")
self.assertEqual(rows["optout@example.com"]["suppression_state"], "opted_out")
def test_expected_recipients_sort_first_and_get_no_evidence_rows(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(RECIPIENT_FIXTURES / "expected_recipients.txt"),
)
self.assertEqual(len(result.warnings), 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
self.assertGreater(len(rows), 2)
known_flags = [row["known_recipient"] for row in rows]
self.assertEqual(known_flags, sorted(known_flags, reverse=True))
missing_rows = [row for row in rows if row["affected_email_address"] == "missing@example.com"]
self.assertTrue(missing_rows)
self.assertTrue(all(row["known_recipient"] == "true" for row in missing_rows))
absent_rows = [row for row in rows if row["affected_email_address"] == "absent@example.com"]
self.assertEqual(len(absent_rows), 1)
self.assertEqual(absent_rows[0]["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
self.assertEqual(absent_rows[0]["assessment_category"], "undef")
self.assertEqual(absent_rows[0]["assessment_subclass"], "undef.no_signal")
self.assertEqual(absent_rows[0]["evidence_strength"], "none")
store = StateStore(config.storage.path)
try:
quality_addresses = {row["affected_email_address"] for row in store.endpoint_quality_rows()}
finally:
store.close()
self.assertNotIn("absent@example.com", quality_addresses)
def test_csv_expected_recipients_are_supported(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(RECIPIENT_FIXTURES / "expected_recipients.csv"),
expected_recipient_column="email",
)
self.assertEqual(len(result.warnings), 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
optout_rows = [row for row in rows if row["affected_email_address"] == "optout@example.com"]
self.assertTrue(optout_rows)
self.assertTrue(all(row["known_recipient"] == "true" for row in optout_rows))
csv_absent = [row for row in rows if row["affected_email_address"] == "csv-absent@example.com"]
self.assertEqual(csv_absent[0]["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
def test_datetime_range_excludes_messages_outside_range(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
root = Path(tmp)
expected_path = root / "expected.txt"
expected_path.write_text("complained@example.com\nmissing@example.com\n", encoding="utf-8")
config = AppConfig(
mailbox=MailboxConfig(id="test-mailbox", protocol="fixture"),
scan=ScanConfig(),
storage=StorageConfig(path=str(root / "state.sqlite")),
reports=ReportsConfig(output_dir=str(root / "reports")),
source=SourceConfig(fixture_dir=str(FIXTURES)),
)
result = scan_mailbox(
config,
full_rescan=True,
expected_recipients_path=str(expected_path),
range_from="2026-06-02T10:04:00Z",
range_to="2026-06-02T10:04:00Z",
)
self.assertEqual(result.scan.messages_seen, 11)
self.assertEqual(result.scan.messages_parsed, 1)
self.assertTrue(result.report_path and result.report_path.exists())
with result.report_path.open(newline="", encoding="utf-8") as fh:
rows = list(DictReader(fh))
self.assertEqual({row["affected_email_address"] for row in rows}, {"complained@example.com", "missing@example.com"})
complaint = [row for row in rows if row["affected_email_address"] == "complained@example.com"][0]
missing = [row for row in rows if row["affected_email_address"] == "missing@example.com"][0]
self.assertEqual(complaint["normalized_event_type"], "notification.channel.complaint_received")
self.assertEqual(missing["normalized_event_type"], "diagnostic.expected_recipient.no_evidence")
self.assertEqual(result.scan.range_start.isoformat(), "2026-06-02T10:04:00+00:00")
self.assertEqual(result.scan.range_end.isoformat(), "2026-06-02T10:04:00+00:00")
if __name__ == "__main__":
unittest.main()

View File

@@ -2,7 +2,7 @@
id: EMAIL-WP-0001
type: workplan
title: "Repository Onboarding and Implementation Foundation"
domain: custodian
domain: infotech
repo: email-connect
status: finished
owner: codex

View File

@@ -2,7 +2,7 @@
id: EMAIL-WP-0002
type: workplan
title: "MVP Mailbox Evidence Scanner"
domain: custodian
domain: infotech
repo: email-connect
status: finished
owner: codex
@@ -653,7 +653,8 @@ This allows rescanning old messages after parser improvements.
Initial mappings:
| Parsed class | Normalized event | Assessment |
| ------------------------- | ------------------------------------------- | ------------------------------- |
| ------------------------- | ------------------------------------
------- | ------------------------------- |
| `hard_bounce` | `notification.endpoint.rejected_permanent` | `fail.hard_bounce` |
| `soft_bounce` | `notification.endpoint.rejected_temporary` | `undef.deferred` |
| `delayed_delivery_notice` | `notification.endpoint.deferred` | `undef.deferred` |
@@ -672,7 +673,8 @@ The scanner should update basic endpoint quality.
Examples:
| Evidence | Endpoint quality update |
| ------------- | ---------------------------------------------------------- |
| ------------- | ------------------------------------
---------------------- |
| Hard bounce | `reachability = unreachable`, `last_failure_at = now` |
| Soft bounce | `reachability = degraded`, `last_failure_at = now` |
| Complaint | `suppression_state = suppressed` |

View File

@@ -0,0 +1,356 @@
---
id: EMAIL-WP-0003
type: workplan
title: "Expected Recipient Reporting and Mailbox Tutorial"
domain: infotech
repo: email-connect
status: finished
owner: codex
topic_slug: custodian
created: "2026-06-02"
updated: "2026-06-02"
state_hub_workstream_id: "438149d2-4f20-42b1-91fd-cdeff29dec7d"
---
# EMAIL-WP-0003 - Expected Recipient Reporting and Mailbox Tutorial
## 1. Purpose
This workplan extends the mailbox evidence scanner so operators can provide an
optional set of target email addresses that were expected to receive
notifications. When expected recipients are provided, the scanner should include
them in the evidence report even when no mailbox evidence is known for a given
recipient.
The result is a report that can answer both:
```text
What evidence did the mailbox contain?
Which expected recipients have no known email-channel evidence?
```
The scanner must continue to work without a target-recipient list. Email events
remain evidence, not proof of delivery, awareness, or coordination success.
## 2. User Story
As an operator, I want to provide a line-separated list or CSV file of email
addresses that were supposed to receive notifications, scan a mailbox for return
evidence within a chosen time range, and generate a report where expected
recipients are easy to filter and appear before incidental mailbox-only
addresses.
## 3. In Scope
The workplan shall support:
- Optional expected-recipient input from a newline-separated text file.
- Optional expected-recipient input from CSV.
- CLI and config support for recipient list paths.
- Email address normalization and deduplication.
- Reports that can be generated without any expected-recipient input.
- Reports that include expected recipients with no known evidence.
- An explicit `known_recipient` boolean column in the report.
- Default report ordering with known recipients first.
- An `undef` evidence/report row for expected recipients where nothing is known.
- Mail inspection limited to a datetime range.
- Excluding all email evidence from messages outside the configured range.
- Tests that prove no overclaiming occurs for unknown expected recipients.
- A tutorial for generating a mailbox report from configuration through output
review.
## 4. Out of Scope
The workplan does not require:
- Outbound sending.
- Proving that all provided expected recipients were actually contacted.
- Requiring expected recipients for report generation.
- Legal delivery assessment.
- A suppression-management UI.
- Multi-mailbox correlation.
- Cross-batch campaign management.
## 5. Report Semantics
Expected recipients are advisory context supplied by the operator. If an
expected recipient has no evidence rows, the scanner should emit a conservative
unknown row:
```text
event_type: diagnostic.expected_recipient.no_evidence
assessment_category: undef
assessment_subclass: undef.no_signal
evidence_strength: none
known_recipient: true
```
This row means only:
```text
The recipient was supplied as an expected notification target, and this scan
found no mailbox evidence for that address in the inspected time range.
```
It must not mean:
```text
delivery failed
delivery succeeded
recipient was not notified
recipient ignored the message
```
Mailbox-only evidence rows for addresses not in the supplied expected-recipient
set should remain visible with:
```text
known_recipient: false
```
If no expected-recipient input is provided, the report should still be generated
from mailbox evidence only and `known_recipient` should default to `false`.
## 6. Time Range Semantics
The scanner should support an optional inclusive datetime range:
```text
--from 2026-06-01T00:00:00Z
--to 2026-06-02T23:59:59Z
```
Messages outside the range must be excluded before parsing and evidence
generation whenever the message timestamp is available. The range should also be
usable from config. If a message has no parseable timestamp while a range is
active, it is excluded because the scanner cannot confirm that it originated
inside the requested window.
Existing `--since` behavior may be retained as a compatibility alias for the
lower bound, but the new range should be expressed clearly in documentation.
## 7. CLI Target
Example commands:
```text
email-connect scan-mailbox --config config/mailbox.yml --out reports/
email-connect scan-mailbox --config config/mailbox.yml --expected-recipients recipients.txt --out reports/
email-connect scan-mailbox --config config/mailbox.yml --expected-recipients recipients.csv --expected-recipient-column email --out reports/
email-connect scan-mailbox --config config/mailbox.yml --from 2026-06-01T00:00:00Z --to 2026-06-02T23:59:59Z --out reports/
```
## 8. Work Packages
## T01 - Expected Recipient Input Model
```task
id: EMAIL-WP-0003-T01
status: done
priority: high
state_hub_task_id: "d1cd0de0-cbd5-4e8d-8179-000ba10e5506"
```
Tasks:
```text
Add expected-recipient config fields
Add CLI option for expected-recipient file path
Support newline-separated email address files
Support CSV files with configurable email column
Normalize addresses case-insensitively
Deduplicate recipient addresses
Reject or warn on invalid addresses without aborting the scan
```
Acceptance:
```text
The scanner can load zero, one, or many expected recipients from text or CSV.
Invalid recipient rows are visible as warnings or diagnostics.
```
## T02 - Known Recipient Report Column and Ordering
```task
id: EMAIL-WP-0003-T02
status: done
priority: high
state_hub_task_id: "3d7d3bb8-4118-4158-b874-b4e0527eaa85"
```
Tasks:
```text
Add known_recipient boolean column to CSV reports
Mark evidence rows true when affected_email_address matches expected recipients
Mark mailbox-only rows false when no expected list is provided or no match exists
Sort report rows with known recipients first by default
Preserve deterministic secondary sorting
Document filtering behavior for spreadsheet users
```
Acceptance:
```text
Generated reports include known_recipient and place known-recipient rows before
unknown-recipient rows by default.
```
## T03 - No-Evidence Rows for Expected Recipients
```task
id: EMAIL-WP-0003-T03
status: done
priority: high
state_hub_task_id: "aa737837-2f19-4fbf-9920-f98413bd9779"
```
Tasks:
```text
Detect expected recipients with no matching evidence in the inspected range
Generate diagnostic.expected_recipient.no_evidence rows for those recipients
Use assessment_category undef
Use assessment_subclass undef.no_signal
Use evidence_strength none
Avoid endpoint-quality updates from no-evidence rows
Avoid implying delivery failure or delivery success
Deduplicate generated no-evidence rows across rescans
```
Acceptance:
```text
Expected recipients with no mailbox evidence appear in the report as undef
no-signal diagnostics, not as failures or successes.
```
## T04 - Optional Recipient Context
```task
id: EMAIL-WP-0003-T04
status: done
priority: medium
state_hub_task_id: "731cf592-1bbe-4143-b21b-721af281528c"
```
Tasks:
```text
Keep report generation working when no recipient list is provided
Keep report generation working when recipient list is empty
Ensure expected-recipient input is not required for mailbox-only reports
Ensure mailbox-only evidence remains visible even when expected recipients are provided
```
Acceptance:
```text
Reports can be generated with no expected recipients, empty expected recipients,
or partial expected recipients.
```
## T05 - Datetime Range Filtering
```task
id: EMAIL-WP-0003-T05
status: done
priority: high
state_hub_task_id: "22585e83-d995-42d9-9ab2-c383b055fbb8"
```
Tasks:
```text
Add config fields for scan datetime lower and upper bounds
Add CLI options for datetime lower and upper bounds
Treat --since as a compatibility alias for the lower bound
Exclude messages outside the configured range from parsing and evidence generation
Define behavior for messages with no parseable Date header
Apply filtering consistently to fixture and IMAP scans
Store the range on MailboxScan
Add tests for inclusive lower and upper bounds
```
Acceptance:
```text
When a datetime range is configured, the scanner inspects only messages whose
message timestamp falls within the range according to the documented rules.
```
## T06 - Report and Evidence Tests
```task
id: EMAIL-WP-0003-T06
status: done
priority: high
state_hub_task_id: "f30cd5b9-5035-42b4-9eca-a104e2b26ecb"
```
Tasks:
```text
Add text recipient-list fixture
Add CSV recipient-list fixture
Add tests for known_recipient true and false rows
Add tests for known-recipient-first ordering
Add tests for no-evidence undef rows
Add tests that no-evidence rows do not update endpoint quality
Add tests for report generation with no recipient input
Add tests for datetime range exclusion
```
Acceptance:
```text
Automated tests prove expected-recipient reporting, optional recipient input,
and datetime range filtering.
```
## T07 - Mailbox Report Tutorial
```task
id: EMAIL-WP-0003-T07
status: done
priority: medium
state_hub_task_id: "00a29cb9-ac5a-4784-a9c4-7f2d4905405c"
```
Tasks:
```text
Create a tutorial for configuring mailbox access
Show fixture-based dry run
Show live IMAP configuration
Show expected-recipient text list usage
Show expected-recipient CSV usage
Show datetime range usage
Explain known_recipient filtering
Explain undef no-signal rows
Explain evidence limitations and overclaim prevention
Include troubleshooting notes for credentials and empty reports
```
Acceptance:
```text
A new user can follow the tutorial to generate and interpret an email-connect
mailbox evidence report.
```
## 9. Completion Criteria
This workplan is complete when:
1. Expected-recipient input is optional and supports text and CSV files.
2. Reports include `known_recipient`.
3. Known recipients sort first by default.
4. Expected recipients with no evidence produce `undef.no_signal` diagnostic
rows.
5. The scanner still works without any expected recipients.
6. Datetime range filtering excludes messages outside the inspected range.
7. Tests cover recipient input, report ordering, no-evidence rows, optional
input, and datetime range filtering.
8. A tutorial documents how to generate a mailbox evidence report.