Python BeautifulSoup:“响应”类型的对象没有 len()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36709165/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:14:49  来源:igfitidea点击:

BeautifulSoup: object of type 'Response' has no len()

pythonhtmlparsingweb-scrapingbeautifulsoup

提问by Bryan

Issue: when I try to execute the script, BeautifulSoup(html, ...)gives the error message "TypeError: object of type 'Response' has no len(). I tried passing the actual html as a parameter, but it still doesn't work.

问题:当我尝试执行脚本时,BeautifulSoup(html, ...)给出错误消息“TypeError:'Response' 类型的对象没有 len()。我尝试将实际的 html 作为参数传递,但它仍然不起作用。

import requests

url = 'http://vineoftheday.com/?order_by=rating'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, "html.parser")

回答by Matvei Nazaruk

You are getting response.content. But it return response body as bytes (docs). But you should pass strto BeautifulSoup constructor (docs). So you need to use the response.textinstead of getting content.

你得到response.content. 但它以字节(docs)的形式返回响应正文。但是您应该传递str给 BeautifulSoup 构造函数(docs)。所以你需要使用response.text而不是获取内容。

回答by Jorge

Try to pass the HTML text directly

尝试直接传递 HTML 文本

soup = BeautifulSoup(html.text)

回答by Moshe G

If you're using requests.get('https://example.com')to get the HTML, you should use requests.get('https://example.com').text.

如果您使用requests.get('https://example.com')获取 HTML,则应使用requests.get('https://example.com').text.

回答by Atul

you are getting only response code in 'response' and always use browser header for security otherwise you will face many issues

您只在“响应”中获得响应代码,并且始终使用浏览器标头以确保安全,否则您将面临许多问题

Find header in debugger console network section 'header' UserAgent

在调试器控制台网络部分'header' UserAgent 中查找标题

Try

尝试

import requests
from bs4 import BeautifulSoup

from fake_useragent import UserAgent

url = 'http://www.google.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}

response = requests.get(quote_page, headers=headers).text

soup = BeautifulSoup(response, 'html.parser')
print(soup.prettify())

回答by Ozcar Nguyen

It worked for me:

它对我有用:

soup = BeautifulSoup(requests.get("your_url").text)

Now, this code below is better (with lxml parser):

现在,下面的代码更好(使用 lxml 解析器):

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("your_url").text, 'lxml')