Python BeautifulSoup:“响应”类型的对象没有 len()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36709165/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
BeautifulSoup: object of type 'Response' has no len()
提问by Bryan
Issue: when I try to execute the script, BeautifulSoup(html, ...)
gives the error message "TypeError: object of type 'Response' has no len(). I tried passing the actual html as a parameter, but it still doesn't work.
问题:当我尝试执行脚本时,BeautifulSoup(html, ...)
给出错误消息“TypeError:'Response' 类型的对象没有 len()。我尝试将实际的 html 作为参数传递,但它仍然不起作用。
import requests
url = 'http://vineoftheday.com/?order_by=rating'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
回答by Matvei Nazaruk
You are getting response.content
. But it return response body as bytes (docs). But you should pass str
to BeautifulSoup constructor (docs). So you need to use the response.text
instead of getting content.
你得到response.content
. 但它以字节(docs)的形式返回响应正文。但是您应该传递str
给 BeautifulSoup 构造函数(docs)。所以你需要使用response.text
而不是获取内容。
回答by Jorge
Try to pass the HTML text directly
尝试直接传递 HTML 文本
soup = BeautifulSoup(html.text)
回答by Moshe G
If you're using requests.get('https://example.com')
to get the HTML, you should use requests.get('https://example.com').text
.
如果您使用requests.get('https://example.com')
获取 HTML,则应使用requests.get('https://example.com').text
.
回答by Atul
you are getting only response code in 'response' and always use browser header for security otherwise you will face many issues
您只在“响应”中获得响应代码,并且始终使用浏览器标头以确保安全,否则您将面临许多问题
Find header in debugger console network section 'header' UserAgent
在调试器控制台网络部分'header' UserAgent 中查找标题
Try
尝试
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
url = 'http://www.google.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
response = requests.get(quote_page, headers=headers).text
soup = BeautifulSoup(response, 'html.parser')
print(soup.prettify())
回答by Ozcar Nguyen
It worked for me:
它对我有用:
soup = BeautifulSoup(requests.get("your_url").text)
Now, this code below is better (with lxml parser):
现在,下面的代码更好(使用 lxml 解析器):
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("your_url").text, 'lxml')