Java 如何使用硒阅读pdf文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40738373/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 22:53:27  来源:igfitidea点击:

How to read the pdf file using selenium

javapdfselenium-webdriverdownloadpdf-reader

提问by Bugasur

I am working on web page over which there is a link, clicking on which it opens a pdf file on new window. I have to read that pdf file to validate some data against the transactions done. One way is to download that file and then use it. Can any one help me out on this. I have to work on IE 11

我正在处理有一个链接的网页,点击它会在新窗口上打开一个 pdf 文件。我必须阅读该 pdf 文件以根据已完成的交易验证一些数据。一种方法是下载该文件然后使用它。任何人都可以帮我解决这个问题。我必须在 IE 11 上工作

Thanks in Advance.

提前致谢。

回答by Kenil Fadia

Use PDFBox and FontBox.

使用 PDFBox 和 FontBox。

    public String readPDFInURL() throws EmptyFileException, IOException {
        WebDriver driver = new FirefoxDriver();
        // page with example pdf document
        driver.get("file:///C:/Users/admin/Downloads/dotnet_TheRaceforEmpires.pdf");
        URL url = new URL(driver.getCurrentUrl());
        InputStream is = url.openStream();
        BufferedInputStream fileToParse = new BufferedInputStream(is);
        PDDocument document = null;
        try {
            document = PDDocument.load(fileToParse);
            String output = new PDFTextStripper().getText(document);
        } finally {
            if (document != null) {
                document.close();
            }
            fileToParse.close();
            is.close();
        }
        return output;
    }

Since some of the functions from the older versions of PDFBox have been deprecated, we need to use another FontBox along with PDFBox. I have used PDFBox (2.0.3)and FontBox (2.0.3)and it is working fine. It won't read images though.

由于旧版 PDFBox 中的某些功能已被弃用,因此我们需要将另一个 FontBox 与 PDFBox 一起使用。我使用过PDFBox (2.0.3)FontBox (2.0.3)并且工作正常。但它不会读取图像。

回答by Ankit Gupta

First Downlaod pdfbox jar.

首先下载pdfbox jar。

strURL is a web URl which contains .pdf file: like(https://example.com/downloads/presence/Online-Presence-CA-05-02-2017-04-13.pdf)

strURL 是一个包含 .pdf 文件的 Web URl:like( https://example.com/downloads/presence/Online-Presence-CA-05-02-2017-04-13.pdf)

public boolean verifyPDFContent(String strURL, String text) {

        String output ="";
        boolean flag = false;
        try{
            URL url = new URL(strURL);
            BufferedInputStream file = new BufferedInputStream(url.openStream());
            PDDocument document = null;
            try {
                document = PDDocument.load(file);
                output = new PDFTextStripper().getText(document);
                System.out.println(output);
            } finally {
                if (document != null) {
                    document.close();
                }
            }
        }catch(Exception e){
            e.printStackTrace();
        }
        if(output.contains(text)){
            flag =  true;
        }
        return flag;
    }