Python:如何解析诸如:from、to、body、来自带有 Python 的原始电子邮件源的内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17872094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python : How to parse things such as : from, to, body, from a raw email source w/Python
提问by
The raw email usually looks something like this
原始电子邮件通常看起来像这样
From [email protected] Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: [email protected]
Subject: ooooooooooooooooooooooo
To: [email protected]
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"
This is a multi-part message in MIME format.
--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
--bound1374805739--
So if I wanted to code a PYTHON script to get the
所以如果我想编写一个 PYTHON 脚本来获得
From
To
Subject
Body
Is this the code I am looking for to built on of or is there a better method?
这是我正在寻找的代码还是有更好的方法?
a='<title>aaa</title><title>aaa2</title><title>aaa3</title>'
import re
a1 = re.findall(r'<(title)>(.*?)<(/title)>', a)
采纳答案by Daniel Roseman
I don't really understand what your final code snippet has to do with anything - you haven't mentioned anything about HTML until that point, so I don't know why you would suddenly be giving an example of parsing HTML (which you should never do with a regex anyway).
我真的不明白你的最终代码片段与任何事情有什么关系——在那之前你还没有提到任何关于 HTML 的东西,所以我不知道你为什么会突然给出一个解析 HTML 的例子(你应该无论如何都不要使用正则表达式)。
In any case, to answer your original question about getting the headers from an email message, Python includes code to do that in the standard library:
在任何情况下,为了回答有关从电子邮件消息中获取标题的原始问题,Python 在标准库中包含了执行此操作的代码:
import email
msg = email.message_from_string(email_string)
msg['from'] # '[email protected]'
msg['to'] # '[email protected]'
回答by Serial
you could write that raw content to a file
您可以将该原始内容写入文件
then read the file like this:
然后像这样读取文件:
with open('in.txt', 'r') as file:
raw = file.readlines()
get_list = ['From:','To:','Subject:']
info_list = []
for i in raw:
for word in get_list:
if i.startswith(word):
info_list.append(i)
now info_list
will be:
现在info_list
将是:
['From: [email protected]', 'Subject: ooooooooooooooooooooooo', 'To: [email protected]']
i dont see Body:
in your raw content
我没有看到Body:
你的原始内容
回答by Mark Roberts
You should probably use email.parser
您可能应该使用email.parser
s = """
From [email protected] Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <[email protected]>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: [email protected]
Subject: ooooooooooooooooooooooo
To: [email protected]
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"
This is a multi-part message in MIME format.
--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
--bound1374805739--
"""
import email.parser
msg = email.parser.Parser().parsestr(s)
help(msg)
回答by P0W
回答by Kiwi
Fortunately Python makes this simpler: http://docs.python.org/2.7/library/email.parser.html#email.parser.Parser
幸运的是 Python 使这更简单:http: //docs.python.org/2.7/library/email.parser.html#email.parser.Parser
from email.parser import Parser
parser = Parser()
emailText = """PUT THE RAW TEXT OF YOUR EMAIL HERE"""
email = parser.parsestr(emailText)
print email.get('From')
print email.get('To')
print email.get('Subject')
The body is trickier. Call email.is_multipart()
. If that's false, you can get your body by calling email.get_payload()
. However, if it's true, email.get_payload()
will return a list of messages, so you'll have to call get_payload()
on each of those.
身体比较麻烦。打电话email.is_multipart()
。如果这是错误的,您可以通过调用获取您的身体email.get_payload()
。但是,如果它是真的,email.get_payload()
将返回一个消息列表,因此您必须调用其中get_payload()
的每一个。
if email.is_multipart():
for part in email.get_payload():
print part.get_payload()
else:
print email.get_payload()