Python 为什么连接被拒绝？

Question

提问by Peter Lazarov

I am creating a web scraping script and divided it into four pieces. Separately they all work perfect, however when I put them all together I get the following error : urlopen error [Errno 111] Connection refused. I have looked at similar questions to mine and have tried to catch the error with try-except but even that doesn`t work. My all in one code is :

我正在创建一个网页抓取脚本并将其分为四部分。单独它们都可以完美运行，但是当我将它们放在一起时，我收到以下错误：urlopen 错误 [Errno 111] 连接被拒绝。我看过与我类似的问题，并试图用 try-except 来捕捉错误，但即使这样也行不通。我的所有代码是：

from selenium import webdriver
import re
import urllib2
site = ""

def phone():
    global site
    site = "https://www." + site
    if "spokeo" in site:
        browser = webdriver.Firefox()
        browser.get(site)
        content = browser.page_source
        browser.quit()
        m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\*{4})", content)
        if m_obj:    
            print m_obj.group(0)    
    elif "addresses" in site:
        usock = urllib2.urlopen(site)
        data = usock.read()
        usock.close()
        m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\d{4})", data)
        if m_obj:    
            print m_obj.group(0)
    else :
        usock = urllib2.urlopen(site)
        data = usock.read()
        usock.close()
        m_obj = re.search(r"(\d{3}-\s\d{3}-\d{4})", data)
        if m_obj:    
            print m_obj.group(0)

def pipl():
    global site
    url = "https://pipl.com/search/?q=tom+jones&l=Phoenix%2C+AZ%2C+US&sloc=US|AZ|Phoenix&in=6"
    usock = urllib2.urlopen(url)
    data = usock.read()
    usock.close()
    r_list = [#re.compile("spokeo.com/[^\s]+"),
             re.compile("addresses.com/[^\s]+"),
             re.compile("10digits.us/[^\s]+")]
    for r in r_list:
        match = re.findall(r,data)
        for site in match:
            site = site[:-6]
            print site
            phone()

pipl()

Here is my traceback:

这是我的回溯：

Traceback (most recent call last):
  File "/home/lazarov/.spyder2/.temp.py", line 48, in <module>
    pipl()
  File "/home/lazarov/.spyder2/.temp.py", line 46, in pipl
    phone()
  File "/home/lazarov/.spyder2/.temp.py", line 25, in phone
    usock = urllib2.urlopen(site)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

After manually debugging the code I found that the error comes from the function phone(), so I tried to run just that piece :

手动调试代码后，我发现错误来自函数 phone()，所以我尝试只运行那一段：

import re
import urllib2
url = 'http://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7'
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
m_obj = re.search(r"(\d{3}-\d{3}-\d{4})", data)
if m_obj:
    print m_obj.group(0)

And it worked. Which, I believe, shows it`s not that the firewall is actively denying the connection or the respective service is not started on the other site or is overloaded. Any help would be apreciated.

它奏效了。我相信，这表明防火墙并未主动拒绝连接或相应的服务未在其他站点上启动或过载。任何帮助将不胜感激。

Answer 1

采纳答案by furins

Usually the devil is in the detail.

通常魔鬼在细节中。

according to your traceback...

根据您的追溯...

File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)

and your source code...

和你的源代码...

site = "https://www." + site

...I may suppose that in your code you are trying to access https://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7whereas in your test you are connecting to http://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7.

...我可能认为在您的代码中您试图访问https://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7而在您的测试中您连接到http://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7.

try to replace the httpswith http(at least for www.10digits.us): probably the website you are trying to scraping does not respond to the port 443 but only to the port 80(you can check it even with your browser)

尝试更换https与http（至少www.10digits.us）：也许你正试图刮网站的端口443，但只有80端口不响应（你甚至可以用浏览器来查看它）

Python 为什么连接被拒绝？

提问by Peter Lazarov

采纳答案by furins

相关推荐

最近更新

标签

Python 为什么连接被拒绝？

提问by Peter Lazarov

采纳答案by furins

相关推荐

Python json.loads 显示 ValueError: Extra data

Python - 如何安装 xlutils？

如何从 Python cv2、scikit 图像和 mahotas 中的 Internet URL 读取图像？

Python 使用 pandas.to_datetime 时只保留日期部分

相关推荐

最近更新

标签