Java 使用 webdriver 从 Web 表中检索列数据的更好方法

Question

提问by user3188928

I'm trying to fetch data from a table into List<List<String>>in java. Below code works. But it is taking 20+ seconds to fetch data. Would like to know is there any other fasterway to fetch data from table?

我正在尝试从表中获取数据到List<List<String>>java 中。下面的代码有效。但是获取数据需要 20 多秒。想知道有没有其他faster方法可以从表中获取数据？

List<WebElement> rows = table.findElements(By.xpath(".//tbody//tr//td//.."));
List<ArrayList<String>> rowsData = new ArrayList<ArrayList<String>>();

for(WebElement row:rows){
    List<WebElement> rowElements = row.findElements(By.xpath(".//td"));

    ArrayList<String> rowData = new ArrayList<String>();

    for(WebElement column:rowElements){
        rowData.add(column.getText().toString());
    }

    rowsData.add(rowData);
}

return rowsData;

Answer 1

回答by Priyanshu Shekhar

First of all your question is bit surprising for me, how does it work? You have .in xpaths and as per my knowledge selenium does need .in xpath. Anyways answer to your question:

首先你的问题让我有点惊讶，它是如何工作的？你.在 xpaths 中，据我所知，selenium.在 xpath 中确实需要。无论如何回答你的问题：

If there is any possibility to use any other element locator than xpath then use that, it will definitely reduce the execution time. Since you have used for loop there and each loop will try to locate element using xpath and selenium parses entire html document to locate element so obviously it will increase the execution time.
If there is no possibility to use any other locator than xpath then you can disable implicit wait before performing above operation. Since your code does not perform any action like click which refreshes the loaded page so there wont be any issue related to time. Just make sure before performing above operation required tabledom is completely loaded.

如果有可能使用除 xpath 之外的任何其他元素定位器，那么使用它，它肯定会减少执行时间。由于您在那里使用了 for 循环，并且每个循环都会尝试使用 xpath 定位元素，而 selenium 解析整个 html 文档以定位元素，因此显然它会增加执行时间。
如果不可能使用 xpath 以外的任何其他定位器，那么您可以在执行上述操作之前禁用隐式等待。由于您的代码不执行任何操作，例如单击刷新加载的页面，因此不会有任何与时间相关的问题。在执行上述操作之前，请确保table已完全加载所需的dom。

Don't forget to enable implicit wait after finishing above.

完成上述操作后不要忘记启用隐式等待。

It will be like this:

它会是这样的：

driver.manage().timeouts().implicitlyWait(0, TimeUnit.SECONDS);
List<WebElement> rows = table.findElements(By.xpath("//tbody//tr//td//.."));
List<ArrayList<String>> rowsData = new ArrayList<ArrayList<String>>();

for(WebElement row:rows){
List<WebElement> rowElements = row.findElements(By.xpath("//td"));

ArrayList<String> rowData = new ArrayList<String>();

for(WebElement column:rowElements){
    rowData.add(column.getText().toString());
}

rowsData.add(rowData);}
return rowsData;

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

Answer 2

回答by Saifur

I think JSoupis better option for larger html parsing. It provides pretty similar API to Selenium.

我认为JSoup是更大的 html 解析的更好选择。它提供了与Selenium.

String html =  driver.findElement(By.tagName("table")).getAttribute("innerHTML");
ArrayList<String> colsArray = new ArrayList<>();
HashMap<Element, ArrayList<String>> dict = new HashMap<>();

Document document = Jsoup.connect(html).get();
Elements table = document.select("table");

Elements rows = table.select("tr");

for (Element row: rows){

    Elements list = row.select("td");
    ArrayList<String> newList = new ArrayList<>();

    for (Element str: list){
        newList.add(str.text());
    }

    dict.put(row ,newList);
}

return dict;

Answer 3

回答by Shawn Knight

I have created a blog post and an example github project describing this type of situation -- it might help

我创建了一篇博客文章和一个示例 github 项目来描述这种情况——它可能会有所帮助

http://simpleseleniumnotes.blogspot.com/2015/02/interaction-with-html-tables.html https://github.com/5hawnknight/solid-prototype-table

Answer 4

回答by Andrew

Look, the problem is caused by slowness of selenium. If you will use some lib for grabbing html -- the same algorithm will work in 1000 times faster.

看，问题是由硒的缓慢引起的。如果您将使用一些库来抓取 html - 相同的算法将以 1000 倍的速度运行。

Main idea:

大意：

do all work in selenium except of parsing table.
When you need to parse table, take InnerHtml of the this table via Selenium
Parse this html via external lib

除了解析表之外，在 selenium 中完成所有工作。
当你需要解析表时，通过Selenium获取该表的InnerHtml
通过外部库解析这个 html

In case of c# you can use HTMLAgilityPack. In case of java -- you need to google it. I had more than 1000 times faster result with the same algorithm of parsing by this way.

在 c# 的情况下，您可以使用 HTMLAgilityPack。在 Java 的情况下 - 你需要谷歌它。通过这种方式使用相同的解析算法，我得到了 1000 多倍的结果。

Java 使用 webdriver 从 Web 表中检索列数据的更好方法

提问by user3188928

回答by Priyanshu Shekhar

回答by Saifur

回答by Shawn Knight

回答by Andrew

相关推荐

最近更新

标签

Java 使用 webdriver 从 Web 表中检索列数据的更好方法

提问by user3188928

回答by Priyanshu Shekhar

回答by Saifur

回答by Shawn Knight

回答by Andrew

相关推荐

Java 执行最快的搜索 - 我应该使用哪个集合？

Java 当条目的顺序不断变化时如何比较两个 JSON 字符串

Java Spring Cloud Zuul Proxy 背后的 Spring OAuth 授权服务器

Java Spring Boot 配置和使用两个数据源

相关推荐

最近更新

标签