Python:如何解析诸如:from、to、body、来自带有 Python 的原始电子邮件源的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17872094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:22:11  来源:igfitidea点击:

Python : How to parse things such as : from, to, body, from a raw email source w/Python

pythonregexpython-2.7mod-wsgiwsgi

提问by

The raw email usually looks something like this

原始电子邮件通常看起来像这样

From [email protected] Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
    by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
    for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
    by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
    Thu, 25 Jul 2013 19:28:59 -0700
From: [email protected]
Subject: ooooooooooooooooooooooo
To: [email protected]
Cc: 
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo

--bound1374805739--

So if I wanted to code a PYTHON script to get the

所以如果我想编写一个 PYTHON 脚本来获得

From
To
Subject
Body

Is this the code I am looking for to built on of or is there a better method?

这是我正在寻找的代码还是有更好的方法?

a='<title>aaa</title><title>aaa2</title><title>aaa3</title>'

import re
a1 = re.findall(r'<(title)>(.*?)<(/title)>', a)

采纳答案by Daniel Roseman

I don't really understand what your final code snippet has to do with anything - you haven't mentioned anything about HTML until that point, so I don't know why you would suddenly be giving an example of parsing HTML (which you should never do with a regex anyway).

我真的不明白你的最终代码片段与任何事情有什么关系——在那之前你还没有提到任何关于 HTML 的东西,所以我不知道你为什么会突然给出一个解析 HTML 的例子(你应该无论如何都不要使用正则表达式)。

In any case, to answer your original question about getting the headers from an email message, Python includes code to do that in the standard library:

在任何情况下,为了回答有关从电子邮件消息中获取标题的原始问题,Python 在标准库中包含了执行此操作的代码:

import email
msg = email.message_from_string(email_string)
msg['from']  # '[email protected]'
msg['to']    # '[email protected]'

回答by Serial

you could write that raw content to a file

您可以将该原始内容写入文件

then read the file like this:

然后像这样读取文件:

with open('in.txt', 'r') as file:
    raw = file.readlines()

get_list = ['From:','To:','Subject:']
info_list = []

for i in raw:
    for word in get_list:
        if i.startswith(word):
            info_list.append(i)

now info_listwill be:

现在info_list将是:

['From: [email protected]', 'Subject: ooooooooooooooooooooooo', 'To: [email protected]']

i dont see Body:in your raw content

我没有看到Body:你的原始内容

回答by Mark Roberts

You should probably use email.parser

您可能应该使用email.parser

s = """
From [email protected] Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
    by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
    for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
    by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
    Thu, 25 Jul 2013 19:28:59 -0700
From: [email protected]
Subject: ooooooooooooooooooooooo
To: [email protected]
Cc: 
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo

--bound1374805739--
"""

import email.parser

msg = email.parser.Parser().parsestr(s)
help(msg)

回答by P0W

"Body" is not present in your sample email

“正文”不在您的示例电子邮件中

Can use emailmodule :

可以使用电子邮件模块:

import email
    msg = email.message_from_string(email_message_as_text)

Then use:

然后使用:

print email['To']
print email['From']

... ... etc

... ... 等等

回答by Kiwi

Fortunately Python makes this simpler: http://docs.python.org/2.7/library/email.parser.html#email.parser.Parser

幸运的是 Python 使这更简单:http: //docs.python.org/2.7/library/email.parser.html#email.parser.Parser

from email.parser import Parser
parser = Parser()

emailText = """PUT THE RAW TEXT OF YOUR EMAIL HERE"""
email = parser.parsestr(emailText)

print email.get('From')
print email.get('To')
print email.get('Subject')

The body is trickier. Call email.is_multipart(). If that's false, you can get your body by calling email.get_payload(). However, if it's true, email.get_payload()will return a list of messages, so you'll have to call get_payload()on each of those.

身体比较麻烦。打电话email.is_multipart()。如果这是错误的,您可以通过调用获取您的身体email.get_payload()。但是,如果它是真的,email.get_payload()将返回一个消息列表,因此您必须调用其中get_payload()的每一个。

if email.is_multipart():
    for part in email.get_payload():
        print part.get_payload()
else:
    print email.get_payload()