Python httplib.BadStatusLine: ''

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27619258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:01:16  来源:igfitidea点击:

httplib.BadStatusLine: ''

pythonseleniumscrapy

提问by Jonathan Sitruk

As always, I frequently have issues, and I have thoroughly searched for an answer to the current one but find myself at a loss. Here are some of the places I have searched: - How to fix httplib.BadStatusLine exception?- Python httplib2 Handling Exceptions- python http status code

与往常一样,我经常遇到问题,我已经彻底搜索了当前问题的答案,但发现自己不知所措。以下是我搜索过的一些地方: -如何修复 httplib.BadStatusLine 异常?- Python httplib2 处理异常- python http 状态码

My issue is the following. I have created a spider and want to crawl different urls. When I crawl each url independently everything works fine. However, when I try to crawl both I get the following error: httplib.BadStatusLine: ''

我的问题如下。我创建了一个蜘蛛,想要抓取不同的网址。当我独立抓取每个 url 时,一切正常。但是,当我尝试抓取两者时,出现以下错误:httplib.BadStatusLine: ''

I have followed some advice that I read (see links mentioned above) and can print the response.status for each request works, but the response.url does not print and the error is thrown. (I only print both statements to try to identify the source of the error).

我遵循了我阅读的一些建议(请参阅上面提到的链接),并且可以为每个请求打印 response.status 工作,但 response.url 不打印并抛出错误。(我只打印这两个语句来尝试确定错误的来源)。

I hope that this is clear.

我希望这很清楚。

I am using scrapy and selenium

我正在使用scrapy和硒

class PeoplePage(Spider):
    name = "peopleProfile"
    allowed_domains = ["blah.com"]
    handle_httpstatus_list = [200, 404]
    start_urls = [
        "url1",
        "url2"
    ]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF

        self.driver.close()

采纳答案by Nima Soroush

Based on Python Doc, httplib.BadStatusLineraised if a server responds with a HTTP status code that we don't understand. You can try to pass this exception. You should not close your driver if you are going to call more than one url.

基于Python Dochttplib.BadStatusLine如果服务器以我们不理解的 HTTP 状态代码响应,则引发。您可以尝试通过此异常。如果您要调用多个 url,则不应关闭驱动程序。

Try this:

尝试这个:

def parse(self, response):
    try:
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF
    except httplib.BadStatusLine:
        pass

回答by Aaron Lelevier

I made a decorator to do what the top answer does, so as to make the code easily reusable. Here it is:

我做了一个装饰器来完成最佳答案所做的事情,以便使代码易于重用。这里是:

import http

def pass_bad_status_line_exc(wrapped_function):
    """
    Silently pass this exception `http.client.BadStatusLine` decorator
    """
    def _wrapper(*args, **kwargs):
        try:
            result = wrapped_function(*args, **kwargs)
        except http.client.BadStatusLine:
            return
        return result
    return _wrapper

回答by duhaime

I hit this error because I defined a selenium.webdriverinstance (named driver), called driver.quit()on it, then tried to call driver.get(url)on the quit driver. The solution is to not call driver.quit().

我遇到这个错误是因为我定义了一个selenium.webdriver实例(名为driver),调用driver.quit()它,然后尝试调用driver.get(url)退出驱动程序。解决方案是不调用driver.quit().

回答by John Conrad Geenty

I'm not sure how much this will help, but for me, I was trying to issue a POST request and you need a new HTTP Connection in order to do it. You can't use the same connection for multiple requests. I keep on getting the same error: httplib.BadStatusLine: ''. I believe the documentation outlines this, I just overlooked it.

我不确定这会有多大帮助,但对我来说,我试图发出一个 POST 请求,你需要一个新的 HTTP 连接才能做到这一点。您不能对多个请求使用相同的连接。我不断收到同样的错误:httplib.BadStatusLine: ''。我相信文档概述了这一点,我只是忽略了它。