Python 使用 BeautifulSoup 获取 span 标签的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42175190/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get value of span tag using BeautifulSoup
提问by newaccount1111
I have a number of facebook groups that I would like to get the count of the members of. An example would be this group: https://www.facebook.com/groups/347805588637627/I have looked at inspect element on the page and it is stored like so:
我有许多 facebook 群组,我想了解其成员的数量。一个例子是这个组:https: //www.facebook.com/groups/347805588637627/我看过页面上的检查元素,它是这样存储的:
<span id="count_text">9,413 members</span>
I am trying to get "9,413 members" out of the page. I have tried using BeautifulSoup but cannot work it out.
我试图从页面中删除“9,413 名成员”。我曾尝试使用 BeautifulSoup,但无法解决。
Thanks
谢谢
Edit:
编辑:
from bs4 import BeautifulSoup
import requests
url = "https://www.facebook.com/groups/347805588637627/"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
span = soup.find("span", id="count_text")
print(span.text)
回答by Henrik
In case there is more than one span tag in the page:
如果页面中有多个 span 标签:
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_input, 'html.parser')
span = soup.find("span", id="count_text")
span.text
回答by Balthazar Rouberol
You can use the text
attribute of the parsed span:
您可以使用text
解析跨度的属性:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<span id="count_text">9,413 members</span>', 'html.parser')
>>> soup.span
<span id="count_text">9,413 members</span>
>>> soup.span.text
'9,413 members'
回答by Tessaracter
Facebook uses javascrypt
to prevent bots from scraping. You need to use selenium
to extract data on python.
Facebook 用于javascrypt
防止机器人抓取。您需要使用selenium
在python上提取数据。
回答by Karim Elgazar
If you have more than one span tag you can try this
如果你有多个 span 标签,你可以试试这个
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tags = soup('span')
for tag in tags:
print(tag.contents[0])