使用 JAVA 将 .docx 转换为 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24652953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert .docx to HTML using JAVA
提问by Vignesh Paramasivam
I tried converting .doc to HTML by using WordToHtmlConverter
and it worked perfectly.
我尝试通过使用将 .doc 转换为 HTML,WordToHtmlConverter
并且效果很好。
But when i tried to convert .docx to HTML, i got stuck with it.
但是当我尝试将 .docx 转换为 HTML 时,我被它卡住了。
What i tried:
我试过的:
I used the below code to convert .docx to HTML:
我使用以下代码将 .docx 转换为 HTML:
The code which i tried from : How to use Tika's XWPFWordExtractorDecorator class?
我尝试的代码:How to use Tika's XWPFWordExtractorDecorator class?
InputStream input = TikaInputStream.get(new File("C:\Users\Downloads\filename.docx"));
Parser parser = new AutoDetectParser();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
System.out.print("tika : "+xml);
} finally {
input.close();
}
The output what i got is,
我得到的输出是,
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body/>
</html>
- Please explain where i gone wrong?
- Is there any better way to convert .docx to html string
- 请解释我哪里出错了?
- 有没有更好的方法将 .docx 转换为 html 字符串
Appreciate your help, Thanks
感谢您的帮助,谢谢
采纳答案by Vignesh Paramasivam
This code worked for me to convert .docx to html:
这段代码对我有用,可以将 .docx 转换为 html:
You can also look at the link : Link to code
你也可以看一下链接:Link to code
//convert .docx to HTML string
InputStream in= new FileInputStream(new File(path));
XWPFDocument document = new XWPFDocument(in);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));
OutputStream out = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, out, options);
String html=out.toString();
System.out.println(html);
回答by Rakshit Singh
You may want to make use of Mammoth docx to HTML library.Its a library for displaying doc, docx documents by converting them to html on the browser side as well as can be handled on the backend.
您可能想使用 Mammoth docx to HTML library。它是一个用于显示 doc、docx 文档的库,通过在浏览器端将它们转换为 html 以及可以在后端处理。
- Library Supports - JavaScript, both the browser and node.js. Available on npm. Python. Available on PyPI. WordPress. Java/JVM. Available on Maven Central. .NET. Available on NuGet.
- Link: https://mike.zwobble.org/projects/mammoth/(Demo and Article)
- Github: https://github.com/mwilliamson/mammoth.js
- 库支持 - JavaScript,浏览器和 node.js。在 npm 上可用。Python。在 PyPI 上可用。WordPress。Java/JVM。在 Maven 中心可用。。网。在 NuGet 上可用。
- 链接:https: //mike.zwobble.org/projects/mammoth/(演示和文章)
- Github:https: //github.com/mwilliamson/mammoth.js