Python 如何从BeautifulSoup中的span标签获取文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38133759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:23:42  来源:igfitidea点击:

How to get text from span tag in BeautifulSoup

pythonweb-scrapingbeautifulsouppython-3.4

提问by GLHF

I have links looks like this

我有链接看起来像这样

<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>

I'm trying to get 1 GBfrom there. I tried

我试图1 GB从那里得到。我试过

tt  = [a['title'] for a in soup.select(".systemRequirementsRamContent span")]
for ram in tt:
    if "RAM" in ram.split():
        print (soup.string)

It outputs None.

它输出None.

I tried a['text']but it gives me KeyError. How can I fix this and what is my mistake?

我试过了,a['text']但它给了我 KeyError。我该如何解决这个问题,我的错误是什么?

回答by Padraic Cunningham

You can use a css selector, pulling the span you want using the title text :

您可以使用 css 选择器,使用标题文本拉出您想要的跨度:

soup = BeautifulSoup("""<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>""", "xml")

print(soup.select_one("span[title*=RAM]").text)

That finds the spanwith a titleattribute that contains RAM, it is equivalent to saying in python, if "RAM" in span["title"].

那就是找到一个包含RAMtitle属性的span,相当于在 python 中说,.if "RAM" in span["title"]

Or using findwith re.compile

或者使用findre.compile

import re
print(soup.find("span", title=re.compile("RAM")).text)

To get all the data:

要获取所有数据:

from bs4 import BeautifulSoup 
r  = requests.get("http://www.game-debate.com/games/index.php?g_id=21580&game=000%20Plus").content

soup = BeautifulSoup(r,"lxml")
cont = soup.select_one("div.systemRequirementsRamContent")
ram = cont.select_one("span")
print(ram["title"], ram.text)
for span in soup.select("div.systemRequirementsSmallerBox.sysReqGameSmallBox span"):
        print(span["title"],span.text)

Which will give you:

这会给你:

000 Plus Minimum RAM Requirement 1 GB
000 Plus Minimum Operating System Requirement Win Xp 32
000 Plus Minimum Direct X Requirement DX 9
000 Plus Minimum Hard Disk Drive Space Requirement 500 MB
000 Plus GD Adjusted Operating System Requirement Win Xp 32
000 Plus GD Adjusted Direct X Requirement DX 9
000 Plus GD Adjusted Hard Disk Drive Space Requirement 500 MB
000 Plus Recommended Operating System Requirement Win Xp 32
000 Plus Recommended Hard Disk Drive Space Requirement 500 MB

回答by Abu Shoeb

You can simply use spantag in BeautifulSoup or you can include other attributes like class, titlealong with the spantag.

您可以简单地span在 BeautifulSoup 中使用标签,也可以包含其他属性,例如class,title以及span标签。

from BeautifulSoup import BeautifulSoup as BSHTML

htmlText = """<div class="systemRequirementsMainBox">
<div class="systemRequirementsRamContent">
<span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>"""

soup = BSHTML(htmlText)
spans = soup.findAll('span')
# spans = soup.findAll('span', attrs = {'class' : 'your-class-name'}) # or span by class name
# spans = soup.findAll('span', attrs = {'title' : '000 Plus Minimum RAM Requirement'}) # or span with a title
for span in spans:
    print span.text

回答by ganesh sai

contents[0]' after iterating over all the tags in the folder.

contents[0]' 在遍历文件夹中的所有标签之后。