Python 类型错误:必须是 str,而不是 NoneType
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43566543/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
TypeError: must be str, not NoneType
提问by Dylan Boyd
I'm writing my first "real" project, a web crawler, and I don't know how to fix this error. Here's my code
我正在编写我的第一个“真实”项目,一个网络爬虫,但我不知道如何修复这个错误。这是我的代码
import requests
from bs4 import BeautifulSoup
def main_spider(max_pages):
page = 1
for page in range(1, max_pages+1):
url = "https://en.wikipedia.org/wiki/Star_Wars" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll("a"):
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
print(href)
page += 1
main_spider(1)
Here's the error
这是错误
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
TypeError: must be str, not NoneType
回答by Hymanywathy
The first "a" link on the wikipedia page is
维基百科页面上的第一个“a”链接是
<a id="top"></a>
Therefore, link.get("href") will return None, as there is no href.
因此,link.get("href") 将返回 None,因为没有 href。
To fix this, check for None first:
要解决此问题,请先检查 None:
if link.get('href') is not None:
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
# do stuff here
回答by MSeifert
Not all anchors (<a>
elements) need to have a href
attribute (see https://www.w3schools.com/tags/tag_a.asp):
并非所有锚点(<a>
元素)都需要有一个href
属性(参见https://www.w3schools.com/tags/tag_a.asp):
In HTML5, the tag is always a hyperlink, but if it has no href attribute, it is only a placeholder for a hyperlink.
在 HTML5 中,标签总是一个超链接,但如果它没有 href 属性,它只是一个超链接的占位符。
Actually you already got the Exception and Python is great at handling exceptions so why not catch the exception? This style is called "Easier to ask for forgiveness than permission." (EAFP)and is actually encouraged:
实际上你已经得到了 Exception 并且 Python 非常擅长处理异常,那么为什么不捕获异常呢?这种风格被称为“请求宽恕比许可更容易”。(EAFP)并且实际上被鼓励:
import requests
from bs4 import BeautifulSoup
def main_spider(max_pages):
for page in range(1, max_pages+1):
url = "https://en.wikipedia.org/wiki/Star_Wars" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll("a"):
# The following part is new:
try:
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
print(href)
except TypeError:
pass
main_spider(1)
Also the page = 1
and page += 1
lines can be omitted. The for page in range(1, max_pages+1):
instruction is already sufficient here.
另外,page = 1
和page += 1
线可以被省略。这里的for page in range(1, max_pages+1):
说明已经足够了。
回答by E. Ducateme
As noted by @Shiping, your code is not indented properly ... I corrected it below.
Also... link.get('href')
is not returning a string in one of the cases.
正如@Shiping 所指出的,您的代码没有正确缩进......我在下面更正了它。另外...link.get('href')
在其中一种情况下不返回字符串。
import requests
from bs4 import BeautifulSoup
def main_spider(max_pages):
for page in range(1, max_pages+1):
url = "https://en.wikipedia.org/wiki/Star_Wars" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll("a"):
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
print(href)
main_spider(1)
For purposes of evaluating what was happening, I added several lines of code...between several of your existing lines AND removed the offending line (for the time being).
为了评估正在发生的事情,我添加了几行代码......在您现有的几行之间并删除了有问题的行(暂时)。
soup = BeautifulSoup(plain_text, "html.parser")
print('All anchor tags:', soup.findAll('a')) ### ADDED
for link in soup.findAll("a"):
print(type(link.get("href")), link.get("href")) ### ADDED
The result of my additions was this (truncated for brevity):
NOTE: that the first anchor does NOT have an href attribute and thus link.get('href')
can't return a value, so returns None
我添加的结果是这样的(为简洁起见被截断):注意:第一个锚点没有 href 属性,因此link.get('href')
无法返回值,因此返回None
[<a id="top"></a>, <a href="#mw-head">navigation</a>,
<a href="#p-search">search</a>,
<a href="/wiki/Special:SiteMatrix" title="Special:SiteMatrix">sister...
<class 'NoneType'> None
<class 'str'> #mw-head
<class 'str'> #p-search
<class 'str'> /wiki/Special:SiteMatrix
<class 'str'> /wiki/File:Wiktionary-logo-v2.svg
...
To prevent the error, a possible solution would be to add a conditional OR a try/except expression to your code. I'll demo a conditional expression.
为防止该错误,一个可能的解决方案是在您的代码中添加一个条件 OR 一个 try/except 表达式。我将演示一个条件表达式。
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll("a"):
if link.get('href') == None:
continue
else:
href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
print(href)
回答by dhhepting
I had the same error from different code. After adding a conditional inside a function, I thought that the return type was not being set properly, but what I realized was that when the condition was False, the return statement was not being called at all -- a change to my indentation fixed the problem.
我有来自不同代码的相同错误。在函数中添加条件后,我认为返回类型设置不正确,但我意识到当条件为 False 时,根本没有调用 return 语句 - 缩进的更改修复了问题。