java库从文件内容中查找mime类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4348810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 16:21:24  来源:igfitidea点击:

java library to find the mime type from file content

javamimefile-type

提问by Ajith Jose

I am searching for a java library which tells you the mime type by looking at the file content(byte array). I found this project using jmimemagic and it no longer supports newer file types (eg. MS word docx format) as it is inactive now (from 2006).

我正在寻找一个 Java 库,它通过查看文件内容(字节数组)来告诉您 MIME 类型。我发现这个项目使用 jmimemagic,它不再支持较新的文件类型(例如 MS word docx 格式),因为它现在处于非活动状态(从 2006 年开始)。

采纳答案by Ajith Jose

Use Apache tika for content detection. Please find the link below. http://tika.apache.org/0.8/detection.html. We have so many jar dependencies which you can find when you build tika using maven

使用 Apache tika 进行内容检测。请找到以下链接。http://tika.apache.org/0.8/detection.html。我们有很多 jar 依赖项,您可以在使用 maven 构建 tika 时找到它们

        ByteArrayInputStream bai = new ByteArrayInputStream(pByte);
        ContentHandler contenthandler = new BodyContentHandler();
        Metadata metadata = new Metadata();
        Parser parser = new AutoDetectParser();
        try {
              parser.parse(bai, contenthandler, metadata);

        } catch (IOException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        } catch (SAXException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        } catch (TikaException e) {
              // TODO Auto-generated catch block
              e.printStackTrace();
        }           
        System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
        return metadata.get(Metadata.CONTENT_TYPE);

回答by fischermatte

Maybe useful for someone, who needs the most used office formats as well (and does not use Apache Tika):

也许对需要最常用办公格式(并且不使用 Apache Tika)的人有用:

public class MimeTypeUtils {

    private static final Map<String, String> fileExtensionMap;

    static {
        fileExtensionMap = new HashMap<String, String>();
        // MS Office
        fileExtensionMap.put("doc", "application/msword");
        fileExtensionMap.put("dot", "application/msword");
        fileExtensionMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
        fileExtensionMap.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template");
        fileExtensionMap.put("docm", "application/vnd.ms-word.document.macroEnabled.12");
        fileExtensionMap.put("dotm", "application/vnd.ms-word.template.macroEnabled.12");
        fileExtensionMap.put("xls", "application/vnd.ms-excel");
        fileExtensionMap.put("xlt", "application/vnd.ms-excel");
        fileExtensionMap.put("xla", "application/vnd.ms-excel");
        fileExtensionMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
        fileExtensionMap.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template");
        fileExtensionMap.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12");
        fileExtensionMap.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12");
        fileExtensionMap.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12");
        fileExtensionMap.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12");
        fileExtensionMap.put("ppt", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pot", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pps", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("ppa", "application/vnd.ms-powerpoint");
        fileExtensionMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
        fileExtensionMap.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template");
        fileExtensionMap.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow");
        fileExtensionMap.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12");
        fileExtensionMap.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        fileExtensionMap.put("potm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
        fileExtensionMap.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12");
        // Open Office
        fileExtensionMap.put("odt", "application/vnd.oasis.opendocument.text");
        fileExtensionMap.put("ott", "application/vnd.oasis.opendocument.text-template");
        fileExtensionMap.put("oth", "application/vnd.oasis.opendocument.text-web");
        fileExtensionMap.put("odm", "application/vnd.oasis.opendocument.text-master");
        fileExtensionMap.put("odg", "application/vnd.oasis.opendocument.graphics");
        fileExtensionMap.put("otg", "application/vnd.oasis.opendocument.graphics-template");
        fileExtensionMap.put("odp", "application/vnd.oasis.opendocument.presentation");
        fileExtensionMap.put("otp", "application/vnd.oasis.opendocument.presentation-template");
        fileExtensionMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
        fileExtensionMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet-template");
        fileExtensionMap.put("odc", "application/vnd.oasis.opendocument.chart");
        fileExtensionMap.put("odf", "application/vnd.oasis.opendocument.formula");
        fileExtensionMap.put("odb", "application/vnd.oasis.opendocument.database");
        fileExtensionMap.put("odi", "application/vnd.oasis.opendocument.image");
        fileExtensionMap.put("oxt", "application/vnd.openofficeorg.extension");
    }

    public static String getContentTypeByFileName(String fileName) {
        // 1. first use java's buildin utils
        FileNameMap mimeTypes = URLConnection.getFileNameMap();
        String contentType = mimeTypes.getContentTypeFor(fileName);
        // 2. nothing found -> lookup our in extension map to find types like ".doc" or ".docx"
        if (!StringUtils.hasText(contentType)) {
            String extension = FilenameUtils.getExtension(fileName);
            contentType = fileExtensionMap.get(extension);
        }
        return contentType;
    }
}

回答by Thad

I use javax.activation.MimetypesFileTypeMap. It starts with a small set: $JRE_HOME/lib/content-types.properties, but you can add you own. Create a file mime.typesin the format shown in MimetypesFileTypeMap's javadoc (I started with a large list from the net, massaged it, and added types I found missing). Now you can add that in your code by opening your mime.typesfile and adding its contents to your map. However the easier solution is to add your mime.typesfile to the META-INFof your jar. java.activationwill pick that up automagically.

我用javax.activation.MimetypesFileTypeMap. 它从一个小集合开始:$JRE_HOME/lib/content-types.properties,但你可以添加你自己的。mime.typesMimetypesFileTypeMap的 javadoc 中显示的格式创建一个文件(我从网上的一个大列表开始,对其进行了整理,并添加了我发现丢失的类型)。现在,您可以通过打开mime.types文件并将其内容添加到地图中来将其添加到代码中。然而,更简单的解决方案是将您的mime.types文件添加到META-INF您的 jar文件中。java.activation会自动捡起来。