Python 了解 Beautiful Soup 中的 Find() 函数

Question

提问by OneManRiot

I know what I'm trying to do is simple but it's causing me grief. I'd like pull data from HTML using BeautifulSoup. To do that I need to properly use the .find()function. Here's the HTML I'm working with:

我知道我想要做的很简单，但这让我感到悲伤。我想使用 BeautifulSoup 从 HTML 中提取数据。为此，我需要正确使用该.find()功能。这是我正在使用的 HTML：

<div class="audit">

    <div class="profile-info">
        <img class="profile-pic" src="https://pbs.twimg.com/profile_images/471758097036226560/tLLeiOiL_normal.jpeg" />
        <h4>Ed Boon</h4>
        <span class="screen-name"><a href="http://www.twitter.com/noobde" target="_blank">@noobde</a></span>
    </div>

        <div class="followers">
            <div class="pie"></div>
            <div class="pie-data">
                <span class="real number" data-value=73599>73,599</span><span class="real"> Real</span><br />
                <span class="fake number" data-value=32452>32,452</span><span class="fake"> Fake</span><br />
                <h6>Followers</h6>
            </div>
        </div>
        <div class="score">
            <img src="//twitteraudit-prod.s3.amazonaws.com/dist/f977287de6281fe3e1ef36d48d996fb83dd6a876/img/audit-result-good.png" />
            <div class="percentage good">
                69%
            </div>
            <h6>Audit score</h6>

The values I want are 73599from data-value=73599, 32352from data-value=32452, and the 69%from percentage good.

我想要的值是73599from data-value=73599、32352fromdata-value=32452和69%from percentage good。

Using past code and online examples, this is what I have so far:

使用过去的代码和在线示例，这是我目前所拥有的：

RealValue = soup.find("div", {"class":"real number"})['data-value']
FakeValue = soup.find("audit", {"class":"fake number"})['data-value']

Both so far to no effect. I'm not sure how to craft the find in order to pull the 69%number.

到目前为止，两者都没有效果。我不确定如何制作 find 以提取69%号码。

Answer 1

采纳答案by alecxe

soup.find("div", {"class":"real number"})['data-value']

Here you are searching for a divelement, but the spanhas the "real number" class in your example HTML data, try instead:

在这里，您正在搜索一个div元素，但span在您的示例 HTML 数据中具有“实数”类，请尝试：

soup.find("span", {"class": "real number", "data-value": True})['data-value']

Here we are also checking for presence of data-valueattribute.

在这里，我们还检查是否存在data-value属性。

To find elements having "real number" or "fake number" classes, you can make a CSS selector:

要查找具有“实数”或“假数”类的元素，您可以创建一个CSS 选择器：

for elm in soup.select(".real.number,.fake.number"):
    print(elm.get("data-value"))

To get the 69%value:

获取69%值：

soup.find("div", {"class": "percentage good"}).get_text(strip=True)

Or, a CSS selector:

或者，一个 CSS 选择器：

soup.select_one(".percentage.good").get_text(strip=True)
soup.select_one(".score .percentage").get_text(strip=True)

Or, locating the h6element having Audit scoretext and then getting the preceding sibling:

或者，定位h6具有Audit score文本的元素，然后获取前面的兄弟元素：

soup.find("h6", text="Audit score").previous_sibling.get_text(strip=True)

Python 了解 Beautiful Soup 中的 Find() 函数

提问by OneManRiot

采纳答案by alecxe

相关推荐

最近更新

标签

Python 了解 Beautiful Soup 中的 Find() 函数

提问by OneManRiot

采纳答案by alecxe

相关推荐

Python 在命名元组中键入提示

我的简单python程序不断收到此错误：“TypeError：'float'对象不能解释为整数”

Python 如何检查字典中是否存在键值对？

Python - PyQT4 如何检测窗口中任意位置的鼠标点击位置？

相关推荐

最近更新

标签