java 如何获取 .MSG 文件的 MIME 类型?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31071425/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the MIME Type of a .MSG file?
提问by CoderNeji
I have tried these ways of finding the MIME type of a file...
我已经尝试过这些方法来查找文件的 MIME 类型...
Path source = Paths
.get("C://Users/akash/Desktop/FW Internal release of MSTClient-Server5.02.04_24.msg");
System.out.println(Files.probeContentType(source));
The above code returns null
...
And if I use the TIKA API from Apache to get the MIME type then it gives it as text/plain...
上面的代码返回 null
......
如果我使用 Apache 的 TIKA API 来获取 MIME 类型,那么它会以文本/纯文本形式提供......
But I want the result as application/vnd.ms-outlook
但我希望结果为 application/vnd.ms-outlook
UPDATE
更新
I also used MIME-Util.jar
as follows with code...
我还使用MIME-Util.jar
如下代码...
MimeUtil2 mimeUtil = new MimeUtil2();
mimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
RandomAccessFile file1 = new RandomAccessFile(
"C://Users/akash/Desktop/FW Internal release of MSTClient-Server5.02.04_24.msg",
"r");
System.out.println(file1.length());
byte[] file = new byte[624128];
file1.read(file, 0, 624128);
String mimeType = MimeUtil2.getMostSpecificMimeType(mimeUtil.getMimeTypes(file)).toString();
This gives me output as application/msword
这给了我输出 application/msword
UPDATE:
更新:
Tika API is out of scope as it is too large to include in the project...
Tika API 超出范围,因为它太大而无法包含在项目中...
So how can I find the MIME type?
那么我怎样才能找到 MIME 类型呢?
采纳答案by Paizo
I tried some of the possible ways and using tika gives the result you expected, I don't see the code you used so i cannot double check it.
我尝试了一些可能的方法并使用 tika 给出了您预期的结果,我没有看到您使用的代码,所以我无法仔细检查它。
I tried different ways, not all in the code snippet:
我尝试了不同的方法,而不是全部在代码片段中:
- Java 7
Files.probeContentType(path)
URLConnection
mime detection from file name and content type guessing- JDK 6 JAF API
javax.activation.MimetypesFileTypeMap
- MimeUtil with all available subclass of
MimeDetector
I found - Apache Tika
- Apache POI scratchpad
- 爪哇 7
Files.probeContentType(path)
URLConnection
从文件名和内容类型猜测中检测 mime- JDK 6 JAF API
javax.activation.MimetypesFileTypeMap
- MimeUtil 与
MimeDetector
我发现的所有可用子类 - 阿帕奇提卡
- Apache POI 暂存器
Here the test class:
这里是测试类:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URLConnection;
import java.util.Collection;
import javax.activation.MimetypesFileTypeMap;
import org.apache.tika.detect.Detector;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AutoDetectParser;
import eu.medsea.mimeutil.MimeUtil;
public class FindMime {
public static void main(String[] args) {
File file = new File("C:\Users\qwerty\Desktop\test.msg");
System.out.println("urlConnectionGuess " + urlConnectionGuess(file));
System.out.println("fileContentGuess " + fileContentGuess(file));
MimetypesFileTypeMap mimeTypesMap = new MimetypesFileTypeMap();
System.out.println("mimeTypesMap.getContentType " + mimeTypesMap.getContentType(file));
System.out.println("mimeutils " + mimeutils(file));
System.out.println("tika " + tika(file));
}
private static String mimeutils(File file) {
try {
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.ExtensionMimeDetector");
// MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.OpendesktopMimeDetector");
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.WindowsRegistryMimeDetector");
// MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.TextMimeDetector");
InputStream is = new BufferedInputStream(new FileInputStream(file));
Collection<?> mimeTypes = MimeUtil.getMimeTypes(is);
return mimeTypes.toString();
} catch (Exception e) {
// TODO: handle exception
}
return null;
}
private static String tika(File file) {
try {
InputStream is = new BufferedInputStream(new FileInputStream(file));
AutoDetectParser parser = new AutoDetectParser();
Detector detector = parser.getDetector();
Metadata md = new Metadata();
md.add(Metadata.RESOURCE_NAME_KEY, "test.msg");
MediaType mediaType = detector.detect(is, md);
return mediaType.toString();
} catch (Exception e) {
// TODO: handle exception
}
return null;
}
private static String urlConnectionGuess(File file) {
String mimeType = URLConnection.guessContentTypeFromName(file.getName());
return mimeType;
}
private static String fileContentGuess(File file) {
try {
InputStream is = new BufferedInputStream(new FileInputStream(file));
return URLConnection.guessContentTypeFromStream(is);
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
}
and this is the output:
这是输出:
urlConnectionGuess null
fileContentGuess null
mimeTypesMap.getContentType application/octet-stream
mimeutils application/msword,application/x-hwp
tika application/vnd.ms-outlook
UpdatedI added this method to test other ways with Tika:
更新我添加了这个方法来测试 Tika 的其他方法:
private static void tikaMore(File file) {
Tika defaultTika = new Tika();
Tika mimeTika = new Tika(new MimeTypes());
Tika typeTika = new Tika(new TypeDetector());
try {
System.out.println(defaultTika.detect(file));
System.out.println(mimeTika.detect(file));
System.out.println(typeTika.detect(file));
} catch (Exception e) {
// TODO: handle exception
}
}
tested with a msg file without extension:
使用没有扩展名的 msg 文件进行测试:
application/vnd.ms-outlook
application/octet-stream
application/octet-stream
tested with a txt file renamed to msg:
使用重命名为 msg 的 txt 文件进行测试:
text/plain
text/plain
application/octet-stream
seems that the most simple way by using the empty constructor is the most reliable in this case.
在这种情况下,使用空构造函数的最简单方法似乎是最可靠的。
Updateyou can make your own checker using Apache POI scratchpad, for example this is a simple implementation to get the mime of the message or null if the file is not in the proper format (usually org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature
):
更新您可以使用 Apache POI 暂存器制作自己的检查器,例如,这是一个简单的实现,用于获取消息的 mime 或 null 如果文件格式不正确(通常为org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature
):
import org.apache.poi.hsmf.MAPIMessage;
public class PoiMsgMime {
public String getMessageMime(String fileName) {
try {
new MAPIMessage(fileName);
return "application/vnd.ms-outlook";
} catch (Exception e) {
return null;
}
}
}
回答by Optional
Taking a cue from comment of @Duffydake, I tried reading the magic numbers. Agreed that first 8 bytes of header for MS files remains same D0 CF 11 E0 A1 B1 1A E1 ( Interesting to see first four byte which looks lik eDoCFilE) but you can check this linkhow to understand complete header and find the file type. (e.g in the link finds an excel file but you can use similar byte reading to find the msg file type)
从@Duffydake 的评论中得到提示,我尝试阅读神奇的数字。同意 MS 文件头的前 8 个字节保持不变 D0 CF 11 E0 A1 B1 1A E1(有趣的是前四个字节看起来像 eDoCFilE)但您可以查看此链接如何理解完整的头并找到文件类型。(例如在链接中找到一个 excel 文件,但您可以使用类似的字节读取来查找 msg 文件类型)
If you can make assumption that no one is going to play around and store, .doc or .xls file as .msg file, then you can just read the first 8 bytes of header and combine it with file extension e.g if(fileExtension.equals(".msg")&&hexHeaderString.equals('D0 CF 11 E0 A1 B1 1A E1'){mimeType=="application/vnd.ms-outlook"}
如果您可以假设没有人会玩弄并将 .doc 或 .xls 文件存储为 .msg 文件,那么您只需读取标头的前 8 个字节并将其与文件扩展名组合,例如 if(fileExtension.equals(".msg")&&hexHeaderString.equals('D0 CF 11 E0 A1 B1 1A E1'){mimeType=="application/vnd.ms-outlook"}
回答by user
What you could do is to try to convert the file to byte[]
and then useMimeMagic
(Maven location here) to handle it. Something like that:
您可以做的是尝试将文件转换为byte[]
然后使用MimeMagic
(此处为Maven 位置)来处理它。类似的东西:
byte[] data = FileUtils.toByteArray("file.msg");
MagicMatch match = Magic.getMagicMatch(data);
String mimeType = match.getMimeType();
I'm not really sure that this will work 100%, but to try is not to die :)
我不确定这是否会 100% 有效,但尝试不会死:)
回答by Atron Seige
I had to get another workaround. What I found was that MS documents (doc, docx, xls, xlsx, msg) are compressed files with a different extension. I have not tested every MS File Type as it is outside of current scope
我不得不寻求另一种解决方法。我发现 MS 文档(doc、docx、xls、xlsx、msg)是具有不同扩展名的压缩文件。我没有测试每个 MS 文件类型,因为它超出了当前范围
Simply expand the file and:
只需展开文件,然后:
Docx : open [Content_Types].xml and check if it contains "wordprocessingml"
Docx:打开 [Content_Types].xml 并检查它是否包含“wordprocessingml”
XlsX : open [Content_Types].xml and check if it contains "spreadsheetml"
XlsX : 打开 [Content_Types].xml 并检查它是否包含“spreadsheetml”
doc : check for file "WordDocument"
doc : 检查文件“WordDocument”
xls : check for file "Workbook"
xls:检查文件“工作簿”
msg : check for file "__properties_version1.0"
味精:检查文件“__properties_version1.0”
I am still testing msg to see if there is something better to use, but this file exists in sent and unsent messages, so I assumeit is safe to use.
我仍在测试 msg 以查看是否有更好的东西可以使用,但是此文件存在于已发送和未发送的消息中,因此我认为它可以安全使用。