Python 使用beautifulsoup在div中获取儿童的文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20889790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get text of childrens in a div with beautifulsoup
提问by Si Mon
Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)
嗨,我想要 Google Playstore 中的应用程序的描述。(https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})
With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).
使用此代码,我获得了该课程的全部内容。但我不能只得到其中的文字。我用 next_silbing 或 .text 尝试了很多东西,但它总是抛出错误(ResultSet 没有属性 xxx)。
I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"
我只想得到这样的文字:“Die Android App von wetter.com!Sie erhalten:..:”
Anyone can help me?
任何人都可以帮助我吗?
回答by Martijn Pieters
Use the .textattribute on the elements; you have a listof results, so loop:
.text在元素上使用属性;你有一个结果列表,所以循环:
for res in result:
print res.text
Alternatively, if there is only ever supposed to be onesuch <div>, use .find()instead of .find_all():
或者,如果只应该有一个这样的<div>,请使用.find()代替.find_all():
result = soup.find("div", {"class":"show-more-content text-body"})
print result.text

