Skip to content

Fix asEmailMessage() raises ValueError when message has multiple TO recipients#477

Draft
glorat wants to merge 16 commits intoTeamMsgExtractor:next-releasefrom
glorat:multi-to
Draft

Fix asEmailMessage() raises ValueError when message has multiple TO recipients#477
glorat wants to merge 16 commits intoTeamMsgExtractor:next-releasefrom
glorat:multi-to

Conversation

@glorat
Copy link
Copy Markdown

@glorat glorat commented Apr 18, 2026

Multiple To recipients produce duplicate TO keys in msg.header. The header-copy loop was assigning each individually, but EmailMessage enforces a single TO field. Duplicate keys are now merged into one comma-separated value before assignment.

Adds example multi-to.msg and tests covering both parsing and EML conversion.

CHECKLIST

  • Issue asEmailMessage() raises ValueError when message has multiple TO recipients #476
  • Have you listed any changes to install or build dependencies? NONE
  • Ensured your changes are compatible with Python 2.7 (ONLY FOR v0.29). - N/A
  • Have you updated the CHANGELOG with details of changes to the codebase, this includes new functionality, deprecated features, or any other material changes.
  • If necessary, have you bumped the version number? No - left it to you guys
  • Have you included py.test tests with your pull request. (Not yet necessary)
  • Ensured your code is as close to PEP 8 compliant as possible?
  • Ensured your pull request is to the next-release branch (or v0.29 if applicable)?

Multiple To recipients produce duplicate TO keys in msg.header. The
header-copy loop was assigning each individually, but EmailMessage
enforces a single TO field. Duplicate keys are now merged into one
comma-separated value before assignment.

Adds example multi-to.msg and tests covering both parsing and EML
conversion.
@glorat glorat changed the title Multi to Fix asEmailMessage() raises ValueError when message has multiple TO recipients Apr 18, 2026
glorat added 3 commits April 18, 2026 22:36
The header dedup dict was keyed case-sensitively, so 'TO' and 'To'
were treated as separate keys and both assigned to the EmailMessage,
triggering the single-field constraint. Dedup now keys by lowercased
name while preserving the original casing of the first occurrence.

Adds multi-to-to.msg and tests covering both parsing and EML conversion.
… headers

Improper header unfolding left tab characters mid-value after stripping
newlines. RFC 2047 encoded words using invalid charsets (e.g. malformed
GB2312) were passed raw to EmailMessage, whose folding code then failed
to re-encode the decoded replacement characters via as_bytes().

Fix uses proper RFC 5322 unfolding and re-encodes problematic encoded
words as UTF-8 before assigning to EmailMessage.
Real-world emails often label headers as GB2312 but use byte sequences
only valid in GBK (a strict superset). The previous latin-1 fallback
produced garbled display names. Now tries GBK/CP936 before latin-1
when a GB2312-declared encoded word fails to decode.
@glorat glorat marked this pull request as draft April 19, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants