用python解析outlook .msg文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26322255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:21:56  来源:igfitidea点击:

Parsing outlook .msg files with python

pythonemailoutlook

提问by Michael

Looked around and couldn't find a satisfactory answer. Does anyone know how to parse .msg files from outlook with Python?

环顾四周,没有找到满意的答案。有谁知道如何使用 Python 解析 Outlook 中的 .msg 文件?

I've tried using mimetools and email.parser with no luck. Help would be greatly appreciated!

我试过使用 mimetools 和 email.parser 没有运气。帮助将不胜感激!

采纳答案by Brent Edwards

This works for me:

这对我有用:

import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\test_msg.msg")

print msg.SenderName
print msg.SenderEmailAddress
print msg.SentOn
print msg.To
print msg.CC
print msg.BCC
print msg.Subject
print msg.Body

count_attachments = msg.Attachments.Count
if count_attachments > 0:
    for item in range(count_attachments):
        print msg.Attachments.Item(item + 1).Filename

del outlook, msg

Please refer to the following post regarding methods to access email addresses and not just the names (ex. "John Doe") from the To, CC and BCC properties - enter link description here

请参阅以下有关访问电子邮件地址的方法的帖子,而不仅仅是来自收件人、抄送和密件抄送属性的名称(例如“John Doe”) -在此处输入链接描述

回答by Dmitry Streblechenko

See the following links:
[MS-OXMSG]: Outlook Item (.msg) File Format,
Read from .msg files,
Edit a saved Outlook Message File *.msg

请参阅以下链接:
[MS-OXMSG]:Outlook 项目 (.msg) 文件格式
从 .msg 文件读取
编辑已保存的 Outlook 消息文件 *.msg

You can also use Redemptionand its RDOSession.GetMessageFromMsgFilemethod:

您还可以使用Redemption及其RDOSessionGetMessageFromMsgFile方法:

  set Session = CreateObject("Redemption.RDOSession")
  set Msg = Session.GetMessageFromMsgFile("c:\temp\test.msg")
  MsgBox Msg.Subject

回答by fatih_dur

Even though this is an old thread, I hope this information might help someone who is looking for a solution to what the thread subject exactlysays. I strongly advise using the solution of mattgwwalker in github, which requires OleFileIO_PL moduleto be installed externally.

即使这是一个旧线程,我希望这些信息可以帮助那些正在寻找线程主题确切所说的解决方案的人。我强烈建议在 github 中使用mattgwwalker的解决方案,这需要在外部安装OleFileIO_PL 模块

回答by paolov

I've tried the python email module and sometimes that doesn't successfully parse the msg file.

我试过 python 电子邮件模块,但有时无法成功解析 msg 文件。

So, in this case, if you are only after text or html, the following code worked for me.

因此,在这种情况下,如果您只关注文本或 html,以下代码对我有用。

start_text = "<html>"
end_text = "</html>"
def parse_msg(msg_file,start_text,end_text):
  with  open(msg_file) as f:
    b=f.read()
  return b[b.find(start_text):b.find(end_text)+len(end_text)]

print parse_msg(path_to_msg_file,start_text,end_text)

回答by Vladimir Lukin

I succeeded extracting relevant fields from MS Outlook files (.msg) using msg-extractorutilitity by Matt Walker.

我使用msg-extractorMatt Walker 的实用程序成功地从 MS Outlook 文件 (.msg) 中提取了相关字段。

Prerequesites

先决条件

pip install extract-msg

Note, it may require to install additional modules, in my case, it required to install imapclient:

请注意,它可能需要安装其他模块,就我而言,它需要安装 imapclient:

pip install imapclient

Usage

用法

import extract_msg

f = r'MS_Outlook_file.msg'  # Replace with yours
msg = extract_msg.Message(f)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body

print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))
print('Body: {}'.format(msg_message))

There are many other goodies in MsgExtractor utility, to be explored, but this is good to start with.

MsgExtractor 实用程序中还有许多其他优点有待探索,但最好从这里开始。

Note

笔记

I had to comment out lines 3 to 8 within the file C:\Anaconda3\Scripts\ExtractMsg.py:

我不得不注释掉文件 C:\Anaconda3\Scripts\ExtractMsg.py 中的第 3 到 8 行:

#"""
#ExtractMsg:
#    Extracts emails and attachments saved in Microsoft Outlook's .msg files
#
#https://github.com/mattgwwalker/msg-extractor
#"""

Error message was:

错误消息是:

line 3
    ExtractMsg:
              ^
SyntaxError: invalid syntax

After blocking those lines, the error message disappeared and the code worked just fine.

阻止这些行后,错误消息消失了,代码运行得很好。

回答by Sazzad

I was able to parse it similar way as Vladimir mentioned above. However I needed to make small change by adding a for loop. The glob.glob(r'c:\test_email*.msg') returns a list whereas the Message(f) expect a file or str.

我能够像上面提到的 Vladimir 一样解析它。但是我需要通过添加一个 for 循环来做一些小的改变。glob.glob(r'c:\test_email*.msg') 返回一个列表,而 Message(f) 需要一个文件或 str。

f = glob.glob(r'c:\test_email\*.msg')

for filename in f:
    msg = ExtractMsg.Message(filename)
    msg_sender = msg.sender
    msg_date = msg.date
    msg_subj = msg.subject
    msg_message = msg.body

回答by Uros

I found on the net a module called MSG PY. This is Microsoft Outlook .msg file module for Python. The module allows you to easy create/read/parse/convert Outlook .msg files. The module does not require Microsoft Outlook to be installed on the machine or any other third party application or library in order to work. For example:

我在网上找到了一个名为 MSG PY 的模块。这是用于 Python 的 Microsoft Outlook .msg 文件模块。该模块允许您轻松创建/读取/解析/转换 Outlook .msg 文件。该模块不需要在机器或任何其他第三方应用程序或库上安装 Microsoft Outlook 即可工作。例如:

from independentsoft.msg import Message

appointment = Message("e:\appointment.msg")

print("subject: " + str(appointment.subject))
print("start_time: " + str(appointment.appointment_start_time))
print("end_time: " + str(appointment.appointment_end_time))
print("location: " + str(appointment.location))
print("is_reminder_set: " + str(appointment.is_reminder_set))
print("sender_name: " + str(appointment.sender_name))
print("sender_email_address: " + str(appointment.sender_email_address))
print("display_to: " + str(appointment.display_to))
print("display_cc: " + str(appointment.display_cc))
print("body: " + str(appointment.body))