用python解析outlook .msg文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26322255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing outlook .msg files with python
提问by Michael
Looked around and couldn't find a satisfactory answer. Does anyone know how to parse .msg files from outlook with Python?
环顾四周,没有找到满意的答案。有谁知道如何使用 Python 解析 Outlook 中的 .msg 文件?
I've tried using mimetools and email.parser with no luck. Help would be greatly appreciated!
我试过使用 mimetools 和 email.parser 没有运气。帮助将不胜感激!
采纳答案by Brent Edwards
This works for me:
这对我有用:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\test_msg.msg")
print msg.SenderName
print msg.SenderEmailAddress
print msg.SentOn
print msg.To
print msg.CC
print msg.BCC
print msg.Subject
print msg.Body
count_attachments = msg.Attachments.Count
if count_attachments > 0:
for item in range(count_attachments):
print msg.Attachments.Item(item + 1).Filename
del outlook, msg
Please refer to the following post regarding methods to access email addresses and not just the names (ex. "John Doe") from the To, CC and BCC properties - enter link description here
请参阅以下有关访问电子邮件地址的方法的帖子,而不仅仅是来自收件人、抄送和密件抄送属性的名称(例如“John Doe”) -在此处输入链接描述
回答by Dmitry Streblechenko
See the following links:
[MS-OXMSG]: Outlook Item (.msg) File Format,
Read from .msg files,
Edit a saved Outlook Message File *.msg
请参阅以下链接:
[MS-OXMSG]:Outlook 项目 (.msg) 文件格式、
从 .msg 文件读取、
编辑已保存的 Outlook 消息文件 *.msg
You can also use Redemptionand its RDOSession.GetMessageFromMsgFilemethod:
您还可以使用Redemption及其RDOSession。GetMessageFromMsgFile方法:
set Session = CreateObject("Redemption.RDOSession")
set Msg = Session.GetMessageFromMsgFile("c:\temp\test.msg")
MsgBox Msg.Subject
回答by fatih_dur
Even though this is an old thread, I hope this information might help someone who is looking for a solution to what the thread subject exactlysays. I strongly advise using the solution of mattgwwalker in github, which requires OleFileIO_PL moduleto be installed externally.
即使这是一个旧线程,我希望这些信息可以帮助那些正在寻找线程主题确切所说的解决方案的人。我强烈建议在 github 中使用mattgwwalker的解决方案,这需要在外部安装OleFileIO_PL 模块。
回答by paolov
I've tried the python email module and sometimes that doesn't successfully parse the msg file.
我试过 python 电子邮件模块,但有时无法成功解析 msg 文件。
So, in this case, if you are only after text or html, the following code worked for me.
因此,在这种情况下,如果您只关注文本或 html,以下代码对我有用。
start_text = "<html>"
end_text = "</html>"
def parse_msg(msg_file,start_text,end_text):
with open(msg_file) as f:
b=f.read()
return b[b.find(start_text):b.find(end_text)+len(end_text)]
print parse_msg(path_to_msg_file,start_text,end_text)
回答by Vladimir Lukin
I succeeded extracting relevant fields from MS Outlook files (.msg) using msg-extractorutilitity by Matt Walker.
我使用msg-extractorMatt Walker 的实用程序成功地从 MS Outlook 文件 (.msg) 中提取了相关字段。
Prerequesites
先决条件
pip install extract-msg
Note, it may require to install additional modules, in my case, it required to install imapclient:
请注意,它可能需要安装其他模块,就我而言,它需要安装 imapclient:
pip install imapclient
Usage
用法
import extract_msg
f = r'MS_Outlook_file.msg' # Replace with yours
msg = extract_msg.Message(f)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body
print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))
print('Body: {}'.format(msg_message))
There are many other goodies in MsgExtractor utility, to be explored, but this is good to start with.
MsgExtractor 实用程序中还有许多其他优点有待探索,但最好从这里开始。
Note
笔记
I had to comment out lines 3 to 8 within the file C:\Anaconda3\Scripts\ExtractMsg.py:
我不得不注释掉文件 C:\Anaconda3\Scripts\ExtractMsg.py 中的第 3 到 8 行:
#"""
#ExtractMsg:
# Extracts emails and attachments saved in Microsoft Outlook's .msg files
#
#https://github.com/mattgwwalker/msg-extractor
#"""
Error message was:
错误消息是:
line 3
ExtractMsg:
^
SyntaxError: invalid syntax
After blocking those lines, the error message disappeared and the code worked just fine.
阻止这些行后,错误消息消失了,代码运行得很好。
回答by Sazzad
I was able to parse it similar way as Vladimir mentioned above. However I needed to make small change by adding a for loop. The glob.glob(r'c:\test_email*.msg') returns a list whereas the Message(f) expect a file or str.
我能够像上面提到的 Vladimir 一样解析它。但是我需要通过添加一个 for 循环来做一些小的改变。glob.glob(r'c:\test_email*.msg') 返回一个列表,而 Message(f) 需要一个文件或 str。
f = glob.glob(r'c:\test_email\*.msg')
for filename in f:
msg = ExtractMsg.Message(filename)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body
回答by Uros
I found on the net a module called MSG PY. This is Microsoft Outlook .msg file module for Python. The module allows you to easy create/read/parse/convert Outlook .msg files. The module does not require Microsoft Outlook to be installed on the machine or any other third party application or library in order to work. For example:
我在网上找到了一个名为 MSG PY 的模块。这是用于 Python 的 Microsoft Outlook .msg 文件模块。该模块允许您轻松创建/读取/解析/转换 Outlook .msg 文件。该模块不需要在机器或任何其他第三方应用程序或库上安装 Microsoft Outlook 即可工作。例如:
from independentsoft.msg import Message
appointment = Message("e:\appointment.msg")
print("subject: " + str(appointment.subject))
print("start_time: " + str(appointment.appointment_start_time))
print("end_time: " + str(appointment.appointment_end_time))
print("location: " + str(appointment.location))
print("is_reminder_set: " + str(appointment.is_reminder_set))
print("sender_name: " + str(appointment.sender_name))
print("sender_email_address: " + str(appointment.sender_email_address))
print("display_to: " + str(appointment.display_to))
print("display_cc: " + str(appointment.display_cc))
print("body: " + str(appointment.body))

