在 Selenium (Python) 中遍历表行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37090653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:49:40  来源:igfitidea点击:

Iterating Through Table Rows in Selenium (Python)

pythonseleniumxpath

提问by Fiery Phoenix

I have a webpage with a table that only appears when I click 'Inspect Element' and is not visible through the View Source page. The table contains only two rows with several cells each and looks similar to this:

我有一个带有表格的网页,该网页仅在我单击“检查元素”时才会出现,并且在“查看源代码”页面中不可见。该表只包含两行,每行有几个单元格,看起来类似于:

<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>

What I'm trying to do is to iterate through the rows and return the text contained in each cell.I can't really seem to do it with Selenium. The elements contain no IDs and I'm not sure how else to get them. I'm not very familiar with using xpaths and such.

我想要做的是遍历行并返回每个单元格中包含的文本。我似乎无法用 Selenium 做到这一点。这些元素不包含 ID,我不确定如何获取它们。我对使用 xpaths 等不是很熟悉。

Here is a debugging attempt that returns a TypeError:

这是一个调试尝试,返回一个TypeError

def check_grades(self):
    table = []
    for i in self.driver.find_element_by_class_name("dddefault"):
        table.append(i)
    print(table)

What is an easy way to get the text from the rows?

从行中获取文本的简单方法是什么?

回答by Padraic Cunningham

If you want to go row by row using an xpath, you can use the following:

如果要使用 xpath 逐行进行,可以使用以下命令:

h  = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""

from lxml import html
xml = html.fromstring(h)
# gets the table
table =  xml.xpath("//table[@class='datadisplaytable']")[0]


# iterate over all the rows   
for row in table.xpath(".//tr"):
     # get the text from all the td's from each row
    print([td.text for td in row.xpath(".//td[@class='dddefault'][text()])

Which outputs:

哪些输出:

['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']

Using td[text()]will avoid getting any Nones returned for the td's that hold no text.

使用td[text()]将避免为不包含文本的 td 返回任何 None 。

So to do the same using selenium you would:

所以要使用硒做同样的事情,你会:

table =  driver.find_element_by_xpath("//table[@class='datadisplaytable']")

for row in table.find_elements_by_xpath(".//tr"):
    print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][1]"])

For multiple tables:

对于多个表:

def get_row_data(table):
   for row in table.find_elements_by_xpath(".//tr"):
        yield [td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][text()]"])


for table in driver.find_elements_by_xpath("//table[@class='datadisplaytable']"):
    for data in get_row_data(table):
        # use the data

回答by Harvey

XPath is fragile. It's better to use CSS selectors or classes:

XPath 是脆弱的。最好使用 CSS 选择器或类:

mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
    for cell in row.find_elements_by_tag_name('td'):
        print(cell.text)

回答by NellieK

Correction of the Selenium part of @Padraic Cunningham's answer:

更正@Padraic Cunningham 答案的硒部分:

table = driver.find_element_by_xpath("//table[@class='datadisplaytable']")

for row in table.find_elements_by_xpath(".//tr"):
    print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault']")])

Note: there was one missing round bracket at the end; also removed the [1] index, to match the first XML example.

注:最后少了一个圆括号;还删除了 [1] 索引,以匹配第一个 XML 示例。

Another note: Though, the example with the index [1] should also be preserved, to show how to extract individual elements.

另一个注意事项:尽管如此,还应保留带有索引 [1] 的示例,以展示如何提取单个元素。

回答by user1457821

Another Version (modified and corrected post by Padraic Cunningham): Tested with Python 3.x

另一个版本(由 Padraic Cunningham 修改和更正的帖子):用 Python 3.x 测试

#!/usr/bin/python

h  = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""

from lxml import html
xml = html.fromstring(h)
# gets the table
table =  xml.xpath("//table[@class='datadisplaytable']")[0]


# iterate over all the rows   
for row in table.xpath(".//tr"):
     # get the text from all the td's from each row
    print([td.text for td in row.xpath(".//td[@class='dddefault']")])