Skip to content

Update get_payload to decode the Content-Transfer-Encoding header#4

Open
noahbass wants to merge 1 commit into
isiah-lloyd:masterfrom
noahbass:payload-fixes
Open

Update get_payload to decode the Content-Transfer-Encoding header#4
noahbass wants to merge 1 commit into
isiah-lloyd:masterfrom
noahbass:payload-fixes

Conversation

@noahbass
Copy link
Copy Markdown
Contributor

Addresses some issues in #2.

This will decode the message payload if it's in base64 or quoted-printable as the Content-Transfer-Encoding header.

So with these changes, this message with a quoted-printable Content-Transfer-Encoding header:

Hello world, =0A=0AIf Lorem ipsum dolor sit amet, consectetur=
adipiscing elit. Donec vel ex egestas ante scelerisque vulputate eget et me=
us. =0A=0A In non sollicitudin nulla,=
accumsan vehicula velit=

Will now be output as this:

Hello world, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec vel ex egestas ante scelerisque vulputate eget et metus. In non sollicitudin nulla, accumsan vehicula velit.

See get_payload documentation for details on using decode=True.

This will decode the message payload if it's in base64 or quoted-printable

See get_payload documentation for details on using `decode=True`:
https://docs.python.org/2/library/email.message.html#email.message.Message.get_payload
@isiah-lloyd
Copy link
Copy Markdown
Owner

I don't know how many emails it would affect but it seems like setting decode to true will cause get_payload() to return None if the email is a multipart message. Should we check the content type before that and then set decode to true only if the message is not multipart?

@noahbass
Copy link
Copy Markdown
Contributor Author

Maybe this block would work better? (Untested)

msg_body = ""
for part in msg.walk():
    if part.get_content_maintype() == 'multipart':
        continue
    if part.get_content_type() == 'text/plain':
        msg_body = part.get_payload(decode=True)

part can be a multipart, so if it is then just skip over it?

@noahbass
Copy link
Copy Markdown
Contributor Author

Well, actually I've thought about it and since the loop walks through each part in a message (which could be a multipart message):

msg_body = ""
for part in msg.walk():
    if part.get_content_type() == 'text/plain':
        msg_body = part.get_payload(decode=True)
        break

When it reaches get_payload(decode=True), part will never have content-type multipart b/c the content-type will always be text/plain, so the get_payload will always work.

@isiah-lloyd
Copy link
Copy Markdown
Owner

isiah-lloyd commented Oct 21, 2017

I think that looks good.

EDIT: Actually what about an else clause so that get_payload() is called with decode false so that a multipart message is still posted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants