Java 如何使用硒阅读pdf文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40738373/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read the pdf file using selenium
提问by Bugasur
I am working on web page over which there is a link, clicking on which it opens a pdf file on new window. I have to read that pdf file to validate some data against the transactions done. One way is to download that file and then use it. Can any one help me out on this. I have to work on IE 11
我正在处理有一个链接的网页,点击它会在新窗口上打开一个 pdf 文件。我必须阅读该 pdf 文件以根据已完成的交易验证一些数据。一种方法是下载该文件然后使用它。任何人都可以帮我解决这个问题。我必须在 IE 11 上工作
Thanks in Advance.
提前致谢。
回答by Kenil Fadia
Use PDFBox and FontBox.
使用 PDFBox 和 FontBox。
public String readPDFInURL() throws EmptyFileException, IOException {
WebDriver driver = new FirefoxDriver();
// page with example pdf document
driver.get("file:///C:/Users/admin/Downloads/dotnet_TheRaceforEmpires.pdf");
URL url = new URL(driver.getCurrentUrl());
InputStream is = url.openStream();
BufferedInputStream fileToParse = new BufferedInputStream(is);
PDDocument document = null;
try {
document = PDDocument.load(fileToParse);
String output = new PDFTextStripper().getText(document);
} finally {
if (document != null) {
document.close();
}
fileToParse.close();
is.close();
}
return output;
}
Since some of the functions from the older versions of PDFBox have been deprecated, we need to use another FontBox along with PDFBox. I have used PDFBox (2.0.3)and FontBox (2.0.3)and it is working fine. It won't read images though.
由于旧版 PDFBox 中的某些功能已被弃用,因此我们需要将另一个 FontBox 与 PDFBox 一起使用。我使用过PDFBox (2.0.3)和FontBox (2.0.3)并且工作正常。但它不会读取图像。
回答by Ankit Gupta
First Downlaod pdfbox jar.
首先下载pdfbox jar。
strURL is a web URl which contains .pdf file: like(https://example.com/downloads/presence/Online-Presence-CA-05-02-2017-04-13.pdf)
strURL 是一个包含 .pdf 文件的 Web URl:like( https://example.com/downloads/presence/Online-Presence-CA-05-02-2017-04-13.pdf)
public boolean verifyPDFContent(String strURL, String text) {
String output ="";
boolean flag = false;
try{
URL url = new URL(strURL);
BufferedInputStream file = new BufferedInputStream(url.openStream());
PDDocument document = null;
try {
document = PDDocument.load(file);
output = new PDFTextStripper().getText(document);
System.out.println(output);
} finally {
if (document != null) {
document.close();
}
}
}catch(Exception e){
e.printStackTrace();
}
if(output.contains(text)){
flag = true;
}
return flag;
}