带有 Python 和 urllib2 的源接口
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1150332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Source interface with Python and urllib2
提问by jonasl
How do i set the source IP/interface with Python and urllib2?
如何使用 Python 和 urllib2 设置源 IP/接口?
回答by Alex Martelli
Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect
(in httplib) delegates to socket.create_connection
, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock
and the sock.connect
call, for you to insert the sock.bind
just before sock.connect
that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).
不幸的是,正在使用的标准库模块堆栈(urllib2、httplib、socket)的设计有些糟糕——在操作的关键点,HTTPConnection.connect
(在 httplib 中)委托给socket.create_connection
,这反过来又不会给你任何“钩子”在创建套接字实例sock
和sock.connect
调用之间,您需要在此sock.bind
之前插入sock.connect
设置源 IP 的内容(我正在广泛宣传,因为不要以这种密封、过度封装的方式设计抽象——我我将在本周四的 OSCON 上以“禅和抽象维护的艺术”为标题谈论这个问题——但在这里你的问题是如何处理一堆以这种方式设计的抽象,叹气)。
When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:
当你面临这样的问题时,你只有两个不太好的解决方案:要么复制、粘贴和编辑错误设计的代码,你需要在其中放置一个原始设计者没有迎合的“钩子”;或者,“猴子补丁”那个代码。两者都不是很好,但两者都可以工作,所以至少让我们感谢我们有这样的选择(通过使用开源和动态语言)。在这种情况下,我想我会使用猴子补丁(这很糟糕,但复制和粘贴编码更糟)——一个代码片段,例如:
import socket
true_socket = socket.socket
def bound_socket(*a, **k):
sock = true_socket(*a, **k)
sock.bind((sourceIP, 0))
return sock
socket.socket = bound_socket
Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2
normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket
to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.
根据您的确切需求(您是否需要将所有套接字都绑定到相同的源 IP,或者......?)您可以在urllib2
正常使用之前简单地运行它,或者(当然以更复杂的方式)在需要时运行它只是为了那些您确实需要以某种方式绑定的传出套接字(然后每次恢复socket.socket = true_socket
以避开尚未创建的未来套接字)。第二种选择增加了自己的复杂性以正确编排,所以在解释所有这些复杂性之前,我正在等待您澄清您是否确实需要这些复杂性。
AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection
in its connect
method, see the source here(at the very end of the page) and decide what other functionality of the create_connection
function you may want to embody in your copied/pasted/edited version if you decide to go that route.
AKX 的好答案是“复制/粘贴/编辑”替代方案的一个变体,所以我不需要对此进行太多扩展——但请注意,它并没有完全重现socket.create_connection
其connect
方法,请参阅此处的源代码(在页面末尾)并决定create_connection
如果您决定走这条路线,您可能希望在复制/粘贴/编辑版本中体现该功能的哪些其他功能。
回答by AKX
This seems to work.
这似乎有效。
import urllib2, httplib, socket
class BindableHTTPConnection(httplib.HTTPConnection):
def connect(self):
"""Connect to the host and port specified in __init__."""
self.sock = socket.socket()
self.sock.bind((self.source_ip, 0))
if isinstance(self.timeout, float):
self.sock.settimeout(self.timeout)
self.sock.connect((self.host,self.port))
def BindableHTTPConnectionFactory(source_ip):
def _get(host, port=None, strict=None, timeout=0):
bhc=BindableHTTPConnection(host, port=port, strict=strict, timeout=timeout)
bhc.source_ip=source_ip
return bhc
return _get
class BindableHTTPHandler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(BindableHTTPConnectionFactory('127.0.0.1'), req)
opener = urllib2.build_opener(BindableHTTPHandler)
opener.open("http://google.com/").read() # Will fail, 127.0.0.1 can't reach google.com.
You'll need to figure out some way to parameterize "127.0.0.1" there, though.
不过,您需要想办法在那里参数化“127.0.0.1”。
回答by Jon Parise
Here's a further refinement that makes use of HTTPConnection's source_address argument(introduced in Python 2.7):
这是使用HTTPConnection 的 source_address 参数(在 Python 2.7 中引入)的进一步改进:
import functools
import httplib
import urllib2
class BoundHTTPHandler(urllib2.HTTPHandler):
def __init__(self, source_address=None, debuglevel=0):
urllib2.HTTPHandler.__init__(self, debuglevel)
self.http_class = functools.partial(httplib.HTTPConnection,
source_address=source_address)
def http_open(self, req):
return self.do_open(self.http_class, req)
This gives us a custom urllib2.HTTPHandlerimplementation that is source_address aware. We can add it to a new urllib2.OpenerDirectorand install it as the default opener (for future urlopen()calls) with the following code:
这为我们提供了一个自定义的urllib2.HTTPHandler实现,它可以识别源地址。我们可以将它添加到一个新的urllib2.OpenerDirector并使用以下代码将其安装为默认开启器(用于未来的urlopen()调用):
handler = BoundHTTPHandler(source_address=("192.168.1.10", 0))
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
回答by Marshall Weir
I thought I'd follow up with a slightly better version of the monkey patch. If you need to be able to set different port options on some of the sockets or are using something like SSL that subclasses socket, the following code works a bit better.
我想我会跟进一个稍微好一点的猴子补丁版本。如果您需要能够在某些套接字上设置不同的端口选项,或者使用 SSL 之类的套接字子类,那么以下代码会更好一些。
_ip_address = None
def bind_outgoing_sockets_to_ip(ip_address):
"""This binds all python sockets to the passed in ip address"""
global _ip_address
_ip_address = ip_address
import socket
from socket import socket as s
class bound_socket(s):
def connect(self, *args, **kwargs):
if self.family == socket.AF_INET:
if self.getsockname()[0] == "0.0.0.0" and _ip_address:
self.bind((_ip_address, 0))
s.connect(self, *args, **kwargs)
socket.socket = bound_socket
You have to only bind the socket on connect if you need to run something like a webserver in the same process that needs to bind to a different ip address.
如果您需要在需要绑定到不同 IP 地址的同一进程中运行类似网络服务器的东西,则只需在连接时绑定套接字。
回答by Reid
Reasoning that I should monkey-patch at the highest level available, here's an alternative to Alex's answer which patches httplib
instead of socket
, taking advantage of httplib.HTTPSConnection.__init__()
's source_address
keyword argument (which is not exposed by urllib2
, AFAICT). Tested and working on Python 2.7.2.
推理我应该在可用的最高级别进行猴子补丁,这里有一个替代亚历克斯的答案,其中补丁httplib
而不是socket
,利用httplib.HTTPSConnection.__init__()
的source_address
关键字参数(urllib2
AFAICT未公开)。在 Python 2.7.2 上测试和工作。
import httplib
HTTPSConnection_real = httplib.HTTPSConnection
class HTTPSConnection_monkey(HTTPSConnection_real):
def __init__(*a, **kw):
HTTPSConnection_real.__init__(*a, source_address=(SOURCE_IP, 0), **kw)
httplib.HTTPSConnection = HTTPSConnection_monkey
回答by Andrew
As of Python 2.7 httplib.HTTPConnection had source_address added to it, allowing you to provide an IP port pair to bind to.
从 Python 2.7 开始,httplib.HTTPConnection 添加了 source_address,允许您提供要绑定到的 IP 端口对。
See: http://docs.python.org/2/library/httplib.html#httplib.HTTPConnection
请参阅:http: //docs.python.org/2/library/httplib.html#httplib.HTTPConnection