将 PDF 转换为 HTML 文件 Java API
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22906188/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert PDF to HTML file Java API
提问by user3505725
I want to convert a pdf file to html file using java application. The PDF file contains some images , text etc. Doesn anybody know a good java API? (please don't suggest Aspose). I tried Apache PDFBox but not satisfied
我想使用 java 应用程序将 pdf 文件转换为 html 文件。PDF 文件包含一些图像、文本等。有人知道一个好的 Java API 吗?(请不要建议Aspose)。我试过 Apache PDFBox 但不满意
回答by 4dgaurav
Check out
查看
JPedal, it handles embedded fonts very well but not free.
JPedal,它可以很好地处理嵌入字体但不是免费的。
IcePDF, it is free but afaik it only can extract text/images or render the PDF to an image.
IcePDF,它是免费的,但它只能提取文本/图像或将 PDF 渲染为图像。
public class QHyperArticleHtmlBuilder extends QHtmlBuilder {
QStyle anchorStyle = createStyle("anchorStyle", a);
QStyle sectionStyle = createStyle("sectionStyle", div);
QStyle subsectionStyle = createStyle("subsectionStyle", div);
...
public String buildSubSectionHeading(String anchorName, String text) {
return buildAnchorHeading(subsectionStyle, anchorName, text);
}
protected String buildAnchorHeading(QStyle divStyle,
String anchorName, String text) {
QMutableElement element = create(p);
element.add(br);
element.add(create(a, anchorStyle, name.create(anchorName)))
.add(create(div, divStyle, text));
return element.buildHtml();
}
public String buildLink(String url, String label) {
QMutableElement element = create(a, anchorStyle, href.create(url));
element.add(create(span, underlineStyle))
.add(create(span, linkStyle, label));
return element.buildHtml();
}
}
pre.javaStyle {
font-family: courier new, courier, mono;
background-color: #fbfbfb;
font-size: 11pt;
width: 800px;
border: dashed 1px;
border-color: lightgray;
padding-left: 4px;
}
Resources here
资源在这里
回答by radkovo
CSSBox Pdf2Domis a Java library that allows (among other things) converting PDF to HTML. The distribution contains even a PDFToHTMLcommand line tool based on this library so you can check if the results correspond to your needs. However, converting PDF to HTML is always tricky as noted above. The results depend on the complexity and the structure of the particular PDF file so different tools may be suitable for different PDF files.
CSSBox Pdf2Dom是一个 Java 库,它允许(除其他外)将 PDF 转换为 HTML。该发行版甚至包含一个基于此库的PDFToHTML命令行工具,因此您可以检查结果是否符合您的需要。但是,如上所述,将 PDF 转换为 HTML 总是很棘手。结果取决于特定 PDF 文件的复杂性和结构,因此不同的工具可能适用于不同的 PDF 文件。
回答by alex
You may try to use Print2Flash: www.print2flash.com It can convert to HTML from Java not only pdfs but other kinds of documents as well: Office docs, AutoCAD drawings, etc. It solved all document publishing needs for our company web site.
您可以尝试使用 Print2Flash:www.print2flash.com 它不仅可以将 pdf 文件从 Java 转换为 HTML,还可以将其他类型的文档转换为 HTML:Office 文档、AutoCAD 图纸等。它解决了我们公司网站的所有文档发布需求。
回答by Rob
perhaps you can use this API: https://market.mashape.com/netservice/convert-pdf-to-htmlworks for java, node, php etc...
也许您可以使用此 API:https: //market.mashape.com/netservice/convert-pdf-to-html适用于 java、node、php 等...
回答by Leila Holmann
Try our Java library called jPDFWeb which preserves fonts and image resolution from the original PDF. You can upload your own PDF and try the live demo.
试试我们的名为 jPDFWeb 的 Java 库,它保留原始 PDF 的字体和图像分辨率。您可以上传自己的 PDF 并尝试现场演示。