Python 如何在一个带有美丽汤的div中选择一类div?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22217713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:30:06  来源:igfitidea点击:

How to select a class of div inside of a div with beautiful soup?

pythonbeautifulsoup

提问by parap

I have a bunch of div tags within div tags:

我在 div 标签中有一堆 div 标签:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
</div>
<div class="bar">Don't want this either
</div>

So I'm using python and beautiful soup to separate stuff out. I need all the "bar" class only when it is wrapped inside of a "foo" class div. Here's my code

所以我使用蟒蛇和美丽的汤来分离东西。仅当它包含在“foo”类 div 中时,我才需要所有“bar”类。这是我的代码

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(r'C:\test.htm'))
tag = soup.div
for each_div in soup.findAll('div',{'class':'foo'}):
    print(tag["bar"]).encode("utf-8")

Alternately, I tried:

或者,我尝试过:

from bs4 import BeautifulSoup
soup = BeautifulSoup(open(r'C:\test.htm'))
for each_div in soup.findAll('div',{'class':'foo'}):
     print(each_div.findAll('div',{'class':'bar'})).encode("utf-8")

What am I doing wrong? I would be just as happy with just a simple print(each_div) if I could remove the div class "unwanted" from the selection.

我究竟做错了什么?如果我可以从选择中删除“不需要”的 div 类,我会对一个简单的 print(each_div) 感到满意。

采纳答案by Birei

You can use find_all()to search every <div>elements with fooas attribute and for each one of them use find()for those with baras attribute, like:

您可以使用find_all()到每一个搜索<div>的元素foo的属性,并为他们每个人使用find()的那些bar作为属性,如:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    bar = foo.find('div', attrs={'class': 'bar'})
    print(bar.text)

Run it like:

像这样运行它:

python3 script.py htmlfile

That yields:

这产生:

I want this


UPDATE: Assuming there could exists several <div>elements with barattribute, previous script won't work. It will only find the first one. But you could get their descendants and iterate them, like:

更新:假设可能存在多个<div>具有bar属性的元素,则先前的脚本将不起作用。它只会找到第一个。但是您可以获取它们的后代并对其进行迭代,例如:

from bs4 import BeautifulSoup
import sys 

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
    foo_descendants = foo.descendants
    for d in foo_descendants:
        if d.name == 'div' and d.get('class', '') == ['bar']:
            print(d.text)

With an input like:

输入如下:

<div class="foo">
     <div class="bar">I want this</div>
     <div class="unwanted">Not this</div>
     <div class="bar">Also want this</div>
</div>

It will yield:

它将产生:

I want this
Also want this