Python Beautifulsoup 获取跨度内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22259384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:36:15  来源:igfitidea点击:

Beautifulsoup get span content

pythonhtmlbeautifulsouphtml-parsing

提问by add-semi-colons

I have parsed html page: using beautifulsoup

我已经解析了 html 页面: using beautifulsoup

user_page = urllib2.urlopen(user_url)
souping_page = bs(user_page)
badges = souping_page.body.find('div', attrs={'class': 'badges'})

after this my badgesobject looks like this:

在此之后,我的badges对象如下所示:

<span><span title="9 gold badges"><span class="badge1"></span><span class="badgecount">9</span></span><span title="38 silver badges"><span class="badge2"></span><span class="badgecount">38</span></span><span title="56 bronze badges"><span class="badge3"></span><span class="badgecount">56</span></span></span>

Now I want to extract example 9 gold badges, 38 silver badgesfrom this, I tried to use badges.span.spanbut that doesn't work.

现在我想从中提取 example 9 gold badges38 silver badges我尝试使用badges.span.span但不起作用。

采纳答案by alecxe

Get the parent spanfrom badges, find all top-level spans inside by using find_all()with recursive=False:

span从获取父badges级,通过使用find_all()with找到所有顶级跨度recursive=False

from bs4 import BeautifulSoup


page = """<div class="badges">
<span>
    <span title="9 gold badges"><span class="badge1"></span><span class="badgecount">9</span></span>
    <span title="38 silver badges"><span class="badge2"></span><span class="badgecount">38</span></span>
    <span title="56 bronze badges"><span class="badge3"></span><span class="badgecount">56</span></span>
</span>
</div>"""

soup = BeautifulSoup(page)
badges = soup.body.find('div', attrs={'class': 'badges'})
for span in badges.span.find_all('span', recursive=False):
    print span.attrs['title']

prints:

印刷:

9 gold badges
38 silver badges
56 bronze badges

Hope that helps.

希望有帮助。