java 使用java读取pdf文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4015477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
read pdf files using java
提问by Rim
I want to parse pdf websites.
我想解析pdf网站。
Can anyone say how to extract all the words (word by word) from a pdf file using java.
任何人都可以说如何使用 java 从 pdf 文件中提取所有单词(逐字)。
The code below extract content from a pdf file and write it in another pdf file. I want that the program write it in a text file.
下面的代码从 pdf 文件中提取内容并将其写入另一个 pdf 文件。我希望程序将其写入文本文件中。
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.*;
public class pdf {
private static String INPUTFILE = "http://www.britishcouncil.org/learning-infosheets-medicine.pdf" ;
private static String OUTPUTFILE = "c:/new3.pdf";
public static void main(String[] args) throws DocumentException,
IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream(OUTPUTFILE));
document.open();
PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();
PdfImportedPage page;
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();
}
}
Thanks in advance
提前致谢
回答by Leniel Maccaferri
Take a look at this:
看看这个:
How to Read PDF File in Java(uses Apache PDF Box library)
如何在 Java 中读取 PDF 文件(使用 Apache PDF Box 库)
回答by dina
using org.apache.pdfbox
使用 org.apache.pdfbox
import org.apache.pdfbox.*;
public static String convertPDFToTxt(String filePath) {
byte[] thePDFFileBytes = readFileAsBytes(filePath);
PDDocument pddDoc = PDDocument.load(thePDFFileBytes);
PDFTextStripper reader = new PDFTextStripper();
String pageText = reader.getText(pddDoc);
pddDoc.close();
return pageText;
}
private static byte[] readFileAsBytes(String filePath) {
FileInputStream inputStream = new FileInputStream(filePath);
return IOUtils.toByteArray(inputStream);
}