如何从python漂亮的汤中从表中获取tbody？

Question

提问by JPC

I'm trying to scrap Year & Winners ( first & second columns ) from "List of finals matches" table (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals: I'm using the code below:

我正在尝试从http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals 的“决赛比赛列表”表（第二个表）中删除年份和获胜者（第一列和第二列）：我正在使用以下代码：

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

With the above code, I was able to get first & thrid column just fine. But when I use the same code with http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals, It could not find tbody as its element, but I can see the tbody when I inspect the element.

使用上面的代码，我能够很好地获得第一列和第三列。但是当我使用与相同的代码时http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals，它找不到 tbody 作为它的元素，但是当我检查元素时我可以看到 tbody。

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    soup.findAll('table')[2].tbody.findAll('tr')
    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

Here's what I got from comment error:

这是我从评论错误中得到的：

'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
      8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'

'

Answer 1

回答by Derek Litz

If you are inspecting through the inspect tool in the browser it will insert the tbodytags.

如果您通过浏览器中的检查工具进行检查，它将插入tbody标签。

The source code, may, or may not contain them. I suggest looking at the source view if you really want to know.

源代码可能包含也可能不包含它们。如果您真的想知道，我建议查看源视图。

Either way, you do not need to traverse to the tbody, simply:

无论哪种方式，您都不需要遍历 tbody，只需：

soup.findAll('table')[0].findAll('tr')should work.

soup.findAll('table')[0].findAll('tr')应该管用。

Answer 2

回答by GMPrazzoli

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())
for tr in soup.findAll('table')[2].findAll('tr'):
    #get data

And then search what you need in the table :)

然后在表中搜索您需要的内容:)

Answer 3

回答by Rohit Yadav

Directly run the below code.

直接运行下面的代码。

tr_elements = soup.find_all('table')[2].find_all('tr')

By doing this, you can access the all the <tr>; You will have to use for loop for doing this (There are other possible ways to iterate too). Don't try to find the tbody, it gets added by default.

通过这样做，您可以访问所有的<tr>；您将不得不使用 for 循环来执行此操作（还有其他可能的迭代方法）。不要试图找到tbody，默认情况下它会被添加。

Note:

笔记：

If you are having a problem in getting to the desired tag, decompose the previous tags with .decompose()method.

如果在获取所需标签时遇到问题，请使用.decompose()方法分解先前的标签。

如何从python漂亮的汤中从表中获取tbody？

提问by JPC

回答by Derek Litz

回答by GMPrazzoli

回答by Rohit Yadav

相关推荐

最近更新

标签

如何从python漂亮的汤中从表中获取tbody？

提问by JPC

回答by Derek Litz

回答by GMPrazzoli

回答by Rohit Yadav

相关推荐

Python 中是否有高斯消元的标准解决方案？

Python 使用 numpy 计算成对互信息的最佳方法

Python：timezone.localize() 不起作用

Python SciPy optimize.fmin ValueError：零大小数组到没有标识的缩减操作最大值

相关推荐

最近更新

标签