使用java将pdf转换为word文档
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18149857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting a pdf to word document using java
提问by
I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
我已经成功地使用 Java 将 JPEG 转换为 Pdf,但不知道如何使用 Java 将 Pdf 转换为 Word,下面给出了将 JPEG 转换为 Pdf 的代码。
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
谁能告诉我如何使用 Java 将 Pdf 转换为 Word(.doc/.docx)?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\java\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\java\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}
回答by Raghu Chandra
You can use 7-pdf library
您可以使用 7-pdf 库
have a look at this it may help :
看看这个它可能有帮助:
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!
PS:当给定的文件是非 RGB 图像时,itext 有一些问题,试试这个!!
回答by stefan.schwetschke
In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
实际上,您需要两个库。这两个库都是开源的。第一个是iText,它用于从 PDF 文件中提取文本。第二个是POI,用于创建word文档。
The code is quite simple:
代码非常简单:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.
当心:使用所使用的提取策略,您将丢失所有格式。但是您可以通过插入您自己的更复杂的提取策略来解决此问题。
回答by Haroldo_OK
Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConvertercan help you.
尽管它远非纯粹的 Java 解决方案,但 OpenOffice/LibreOffice 允许人们通过 TCP 端口连接到它;可以使用它来转换文档。如果这看起来是可接受的解决方案,JODConverter可以为您提供帮助。