Python 如何从BeautifulSoup中的span标签获取文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38133759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get text from span tag in BeautifulSoup
提问by GLHF
I have links looks like this
我有链接看起来像这样
<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>
I'm trying to get 1 GB
from there. I tried
我试图1 GB
从那里得到。我试过
tt = [a['title'] for a in soup.select(".systemRequirementsRamContent span")]
for ram in tt:
if "RAM" in ram.split():
print (soup.string)
It outputs None
.
它输出None
.
I tried a['text']
but it gives me KeyError. How can I fix this and what is my mistake?
我试过了,a['text']
但它给了我 KeyError。我该如何解决这个问题,我的错误是什么?
回答by Padraic Cunningham
You can use a css selector, pulling the span you want using the title text :
您可以使用 css 选择器,使用标题文本拉出您想要的跨度:
soup = BeautifulSoup("""<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>""", "xml")
print(soup.select_one("span[title*=RAM]").text)
That finds the spanwith a titleattribute that contains RAM, it is equivalent to saying in python, if "RAM" in span["title"]
.
那就是找到一个包含RAM的title属性的span,相当于在 python 中说,.if "RAM" in span["title"]
Or using findwith re.compile
或者使用find和re.compile
import re
print(soup.find("span", title=re.compile("RAM")).text)
To get all the data:
要获取所有数据:
from bs4 import BeautifulSoup
r = requests.get("http://www.game-debate.com/games/index.php?g_id=21580&game=000%20Plus").content
soup = BeautifulSoup(r,"lxml")
cont = soup.select_one("div.systemRequirementsRamContent")
ram = cont.select_one("span")
print(ram["title"], ram.text)
for span in soup.select("div.systemRequirementsSmallerBox.sysReqGameSmallBox span"):
print(span["title"],span.text)
Which will give you:
这会给你:
000 Plus Minimum RAM Requirement 1 GB
000 Plus Minimum Operating System Requirement Win Xp 32
000 Plus Minimum Direct X Requirement DX 9
000 Plus Minimum Hard Disk Drive Space Requirement 500 MB
000 Plus GD Adjusted Operating System Requirement Win Xp 32
000 Plus GD Adjusted Direct X Requirement DX 9
000 Plus GD Adjusted Hard Disk Drive Space Requirement 500 MB
000 Plus Recommended Operating System Requirement Win Xp 32
000 Plus Recommended Hard Disk Drive Space Requirement 500 MB
回答by Abu Shoeb
You can simply use span
tag in BeautifulSoup or you can include other attributes like class
, title
along with the span
tag.
您可以简单地span
在 BeautifulSoup 中使用标签,也可以包含其他属性,例如class
,title
以及span
标签。
from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>"""
soup = BSHTML(htmlText)
spans = soup.findAll('span')
# spans = soup.findAll('span', attrs = {'class' : 'your-class-name'}) # or span by class name
# spans = soup.findAll('span', attrs = {'title' : '000 Plus Minimum RAM Requirement'}) # or span with a title
for span in spans:
print span.text
回答by ganesh sai
contents[0]' after iterating over all the tags in the folder.
contents[0]' 在遍历文件夹中的所有标签之后。