java 使用 Apache POI 将 Word 转换为 HTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7868713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 21:46:35  来源:igfitidea点击:

Convert Word to HTML with Apache POI

javaapache-poi

提问by Ron

I see that there is a converter called WordToHtmlConverterbut the process method is not exposed. How should I pass a doc file and get HTML file (or OutputStream)?

我看到有一个转换器被调用,WordToHtmlConverter但未公开处理方法。我应该如何传递 doc 文件并获取 HTML 文件(或OutputStream)?

回答by Ron

This code is now working for me!

这段代码现在对我有用!

    HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("D:\temp\seo\1.doc"));

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
            DocumentBuilderFactory.newInstance().newDocumentBuilder()
                    .newDocument());
    wordToHtmlConverter.processDocument(wordDocument);
    Document htmlDocument = wordToHtmlConverter.getDocument();
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());
    System.out.println(result);