Python：AttributeError：'NoneType'对象没有属性'findNext'

Question

提问by user3247140

I am trying to scrape a website with BeautifulSoup but am having a problem. I was following a tutorial done in python 2.7 and it had exactly the same code in it and had no problems.

我正在尝试使用 BeautifulSoup 抓取网站，但遇到了问题。我正在学习在 python 2.7 中完成的教程，其中包含完全相同的代码并且没有任何问题。

import urllib.request
from bs4 import *


htmlfile = urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs")

htmltext = htmlfile.read()

soup = BeautifulSoup(htmltext)
title = (soup.title.text)

body = soup.find("Born").findNext('td')
print (body.text)

If I try to run the program I get,

如果我尝试运行我得到的程序，

Traceback (most recent call last):
  File "C:\Users\USER\Documents\Python Programs\World Population.py", line 13, in <module>
    body = soup.find("Born").findNext('p')
AttributeError: 'NoneType' object has no attribute 'findNext'

Is this a problem with python 3 or am i just too naive?

这是python 3的问题还是我太天真了？

Answer 1

采纳答案by paxdiablo

The findand find_allmethods do not search for arbitrary text in the document, they search for HTML tags.The documentation makes that clear (my italics):

该find和find_all方法不文档中搜索任意文本，他们搜索的HTML标签。文档清楚地说明了这一点（我的斜体）：

Pass in a value for name and you'll tell Beautiful Soup to only consider tagswith certain names. Text strings will be ignored, as will tags whose names that don't match. This is the simplest usage:

传入 name 的值，您将告诉 Beautiful Soup 仅考虑具有特定名称的标签。文本字符串将被忽略，名称不匹配的标签也将被忽略。这是最简单的用法：

soup.find_all("title")
# [<title>The Dormouse's story</title>]

That's why your soup.find("Born")is returning Noneand hence why it complains about NoneType(the type of None) having no findNext()method.

这就是您soup.find("Born")返回的原因None，因此它抱怨NoneType（的类型None）没有findNext()方法。

That page you reference contains (at the time this answer was written) eight copies of the word "born", none of which are tags.

您引用的页面包含（在撰写此答案时）“出生”一词的八个副本，其中没有一个是标签。

Looking at the HTML source for that page, you'll find the best option may be to look for the correct span:

查看该页面的 HTML 源代码，您会发现最好的选择可能是寻找正确的跨度：

<th scope="row" style="text-align:left;">Born</th>
    <td><span class="nickname">Steven Paul Jobs</span><br />
    <span style="display:none">(<span class="bday">1955-02-24</span>)</span>February 24, 1955<br />

Answer 2

回答by Steinar Lima

The findmethod looks for tags, not text. To find the name, birthday and birthplace, you would have to look up the spanelements with the corresponding class name, and access the textattribute of that item:

该find方法查找标签，而不是文本。要查找姓名、生日和出生地，您必须查找span具有相应类名的元素，并访问该text项目的属性：

import urllib.request
from bs4 import *


soup = BeautifulSoup(urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs"))
title = soup.title.text
name = soup.find('span', {'class': 'nickname'}).text
bday = soup.find('span', {'class': 'bday'}).text
birthplace = soup.find('span', {'class': 'birthplace'}).text

print(name)
print(bday)
print(birthplace)

Output:

输出：

Steven Paul Jobs
1955-02-24
San Francisco, California, US

PS: You don't have to call readon urlopen, BS accept file-like objects.

PS：你不必叫read上urlopen，BS接受类文件对象。

Python：AttributeError：'NoneType'对象没有属性'findNext'

提问by user3247140

采纳答案by paxdiablo

回答by Steinar Lima

相关推荐

最近更新

标签

Python：AttributeError：'NoneType'对象没有属性'findNext'

提问by user3247140

采纳答案by paxdiablo

回答by Steinar Lima

相关推荐

Python 熊猫中的矩阵乘法

可以访问英语词典的 Python 模块，包括单词的定义

Python numpy vstack 与 column_stack

Python matplotlib 中的堆栈条形图并为每个部分添加标签（和建议）

相关推荐

最近更新

标签