在 Selenium (Python) 中遍历表行

Question

提问by Fiery Phoenix

I have a webpage with a table that only appears when I click 'Inspect Element' and is not visible through the View Source page. The table contains only two rows with several cells each and looks similar to this:

我有一个带有表格的网页，该网页仅在我单击“检查元素”时才会出现，并且在“查看源代码”页面中不可见。该表只包含两行，每行有几个单元格，看起来类似于：

<table class="datadisplaytable">
<tbody>
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</tbody>
</table>

What I'm trying to do is to iterate through the rows and return the text contained in each cell.I can't really seem to do it with Selenium. The elements contain no IDs and I'm not sure how else to get them. I'm not very familiar with using xpaths and such.

我想要做的是遍历行并返回每个单元格中包含的文本。我似乎无法用 Selenium 做到这一点。这些元素不包含 ID，我不确定如何获取它们。我对使用 xpaths 等不是很熟悉。

Here is a debugging attempt that returns a TypeError:

这是一个调试尝试，返回一个TypeError：

def check_grades(self):
    table = []
    for i in self.driver.find_element_by_class_name("dddefault"):
        table.append(i)
    print(table)

What is an easy way to get the text from the rows?

从行中获取文本的简单方法是什么？

Answer 1

回答by Padraic Cunningham

If you want to go row by row using an xpath, you can use the following:

如果要使用 xpath 逐行进行，可以使用以下命令：

h  = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""

from lxml import html
xml = html.fromstring(h)
# gets the table
table =  xml.xpath("//table[@class='datadisplaytable']")[0]


# iterate over all the rows   
for row in table.xpath(".//tr"):
     # get the text from all the td's from each row
    print([td.text for td in row.xpath(".//td[@class='dddefault'][text()])

Which outputs:

哪些输出：

['16759', 'MATH', '123', '001', 'Calculus']
['16449', 'PHY', '456', '002', 'Physics']

Using td[text()]will avoid getting any Nones returned for the td's that hold no text.

使用td[text()]将避免为不包含文本的 td 返回任何 None 。

So to do the same using selenium you would:

所以要使用硒做同样的事情，你会：

table =  driver.find_element_by_xpath("//table[@class='datadisplaytable']")

for row in table.find_elements_by_xpath(".//tr"):
    print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][1]"])

For multiple tables:

对于多个表：

def get_row_data(table):
   for row in table.find_elements_by_xpath(".//tr"):
        yield [td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault'][text()]"])


for table in driver.find_elements_by_xpath("//table[@class='datadisplaytable']"):
    for data in get_row_data(table):
        # use the data

Answer 2

回答by Harvey

XPath is fragile. It's better to use CSS selectors or classes:

XPath 是脆弱的。最好使用 CSS 选择器或类：

mytable = find_element_by_css_selector('table.datadisplaytable')
for row in mytable.find_elements_by_css_selector('tr'):
    for cell in row.find_elements_by_tag_name('td'):
        print(cell.text)

Answer 3

回答by NellieK

Correction of the Selenium part of @Padraic Cunningham's answer:

更正@Padraic Cunningham 答案的硒部分：

table = driver.find_element_by_xpath("//table[@class='datadisplaytable']")

for row in table.find_elements_by_xpath(".//tr"):
    print([td.text for td in row.find_elements_by_xpath(".//td[@class='dddefault']")])

Note: there was one missing round bracket at the end; also removed the [1] index, to match the first XML example.

注：最后少了一个圆括号；还删除了 [1] 索引，以匹配第一个 XML 示例。

Another note: Though, the example with the index [1] should also be preserved, to show how to extract individual elements.

另一个注意事项：尽管如此，还应保留带有索引 [1] 的示例，以展示如何提取单个元素。

Answer 4

回答by user1457821

Another Version (modified and corrected post by Padraic Cunningham): Tested with Python 3.x

另一个版本（由 Padraic Cunningham 修改和更正的帖子）：用 Python 3.x 测试

#!/usr/bin/python

h  = """<table class="datadisplaytable">
<tr>
<td class="dddefault">16759</td>
<td class="dddefault">MATH</td>
<td class="dddefault">123</td>
<td class="dddefault">001</td>
<td class="dddefault">Calculus</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
<tr>
<td class="dddefault">16449</td>
<td class="dddefault">PHY</td>
<td class="dddefault">456</td>
<td class="dddefault">002</td>
<td class="dddefault">Physics</td>
<td class="dddefault"></td>
<td class="dddead"></td>
<td class="dddead"></td>
</tr>
</table>"""

from lxml import html
xml = html.fromstring(h)
# gets the table
table =  xml.xpath("//table[@class='datadisplaytable']")[0]


# iterate over all the rows   
for row in table.xpath(".//tr"):
     # get the text from all the td's from each row
    print([td.text for td in row.xpath(".//td[@class='dddefault']")])

在 Selenium (Python) 中遍历表行

提问by Fiery Phoenix

回答by Padraic Cunningham

回答by Harvey

回答by NellieK

回答by user1457821

相关推荐

最近更新

标签

在 Selenium (Python) 中遍历表行

提问by Fiery Phoenix

回答by Padraic Cunningham

回答by Harvey

回答by NellieK

回答by user1457821

相关推荐

Python Tensorflow：如何修改张量中的值

Python Anaconda - UnsatisfiableError：发现以下规范存在冲突

Python 如何在pyspark中估计数据帧的实际大小？

Python Pandas 获取列中出现频率最高的值

相关推荐

最近更新

标签