Python 使用beautifulsoup在div中获取儿童的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20889790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:28:08  来源:igfitidea点击:

Get text of childrens in a div with beautifulsoup

pythonhtmlbeautifulsoupurllib2

提问by Si Mon

Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

嗨,我想要 Google Playstore 中的应用程序的描述。(https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).

使用此代码,我获得了该课程的全部内容。但我不能只得到其中的文字。我用 next_silbing 或 .text 尝试了很多东西,但它总是抛出错误(ResultSet 没有属性 xxx)。

I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"

我只想得到这样的文字:“Die Android App von wetter.com!Sie erhalten:..:”

Anyone can help me?

任何人都可以帮助我吗?

回答by Martijn Pieters

Use the .textattribute on the elements; you have a listof results, so loop:

.text在元素上使用属性;你有一个结果列表,所以循环:

for res in result:
    print res.text

Alternatively, if there is only ever supposed to be onesuch <div>, use .find()instead of .find_all():

或者,如果只应该有一个这样的<div>,请使用.find()代替.find_all()

result = soup.find("div", {"class":"show-more-content text-body"})
print result.text