使用 JAVA 将 .docx 转换为 HTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24652953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 13:48:57  来源:igfitidea点击:

Convert .docx to HTML using JAVA

javaapache-tika

提问by Vignesh Paramasivam

I tried converting .doc to HTML by using WordToHtmlConverterand it worked perfectly.

我尝试通过使用将 .doc 转换为 HTML,WordToHtmlConverter并且效果很好。

But when i tried to convert .docx to HTML, i got stuck with it.

但是当我尝试将 .docx 转换为 HTML 时,我被它卡住了。

What i tried:

我试过的:

I used the below code to convert .docx to HTML:

我使用以下代码将 .docx 转换为 HTML:

The code which i tried from : How to use Tika's XWPFWordExtractorDecorator class?

我尝试的代码:How to use Tika's XWPFWordExtractorDecorator class?

        InputStream input = TikaInputStream.get(new File("C:\Users\Downloads\filename.docx"));


        Parser parser = new AutoDetectParser();


        StringWriter sw = new StringWriter();
        SAXTransformerFactory factory = (SAXTransformerFactory)
                 SAXTransformerFactory.newInstance();
        TransformerHandler handler = factory.newTransformerHandler();
        handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        handler.setResult(new StreamResult(sw));


        try {
            Metadata metadata = new Metadata();
            parser.parse(input, handler, metadata, new ParseContext());
            String xml = sw.toString();
            System.out.print("tika : "+xml); 
        } finally {
            input.close();
        }

The output what i got is,

我得到的输出是,

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body/>
</html>
  • Please explain where i gone wrong?
  • Is there any better way to convert .docx to html string
  • 请解释我哪里出错了?
  • 有没有更好的方法将 .docx 转换为 html 字符串

Appreciate your help, Thanks

感谢您的帮助,谢谢

采纳答案by Vignesh Paramasivam

This code worked for me to convert .docx to html:

这段代码对我有用,可以将 .docx 转换为 html:

You can also look at the link : Link to code

你也可以看一下链接:Link to code

       //convert .docx to HTML string
        InputStream in= new FileInputStream(new File(path));
        XWPFDocument document = new XWPFDocument(in);


        XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));

        OutputStream out = new ByteArrayOutputStream();


        XHTMLConverter.getInstance().convert(document, out, options);
        String html=out.toString();
        System.out.println(html);

回答by Rakshit Singh

You may want to make use of Mammoth docx to HTML library.Its a library for displaying doc, docx documents by converting them to html on the browser side as well as can be handled on the backend.

您可能想使用 Mammoth docx to HTML library。它是一个用于显示 doc、docx 文档的库,通过在浏览器端将它们转换为 html 以及可以在后端处理。