Java 如何可靠地检测文件类型?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9738597/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to reliably detect file types?
提问by James Raitsev
Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)
目标:给定文件,确定它是否属于给定类型(XML、JSON、属性等)
Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:
考虑 XML 的情况 - 在我们遇到这个问题之前,以下示例方法运行良好:
try {
saxReader.read(f);
} catch (DocumentException e) {
logger.warn(" - File is not XML: " + e.getMessage());
return false;
}
return true;
As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.
正如预期的那样,当 XML 格式良好时,测试将通过并且方法将返回 true。如果发生不好的事情并且无法解析文件,则将返回 false。
This breaks however when we deal with a malformed XML (still XML though) file.
然而,当我们处理格式错误的 XML(尽管仍然是 XML)文件时,这会中断。
I'd rather not rely on .xml
extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?>
string inside the file etc.
我宁愿不依赖.xml
扩展名(总是失败),<?xml version="1.0" encoding="UTF-8"?>
在文件中寻找字符串等。
Is there another way this can be handled?
有没有其他方法可以处理?
What would you have to see inside the file to "suspect it may be XML
though DocumentException
was caught". This is needed for parsing purposes.
有什么办法能够看到文件里面“怀疑它可能会XML
虽然DocumentException
被抓了”。这是解析目的所必需的。
采纳答案by Lior Kogan
File type detection tools:
文件类型检测工具:
回答by rjdkolb
Apache Tikagives me the least amount of issues and is not platform specific unlike Java 7 : Files.probeContentType
Apache Tika给我的问题最少,并且与 Java 7 不同,它不是特定于平台的:Files.probeContentType
import java.io.File;
import java.io.IOException;
import javax.activation.MimeType;
import org.apache.tika.Tika;
File inputFile = ...
String type = new Tika().detect(inputFile);
System.out.println(type);
For a xml file I got 'application/xml'
对于 xml 文件,我得到了 'application/xml'
for a properties file I got 'text/plain'
对于属性文件,我得到了“文本/纯文本”
You can however add a Detector to the new Tika()
但是,您可以将检测器添加到新的 Tika()
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.xx</version>
</dependency>
回答by kazy
For those who do not need very precise detection (the Java 7's Files.probeContentTypemethod mentioned by rjdkolb)
对于那些不需要非常精确的检测(rjdkolb提到的 Java 7 的 Files.probeContentType方法)
Path filePath = Paths.get("/path/to/your/file.jpg");
String contentType = Files.probeContentType(filePath);