Python 更改 url 中的主机名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21628852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Changing hostname in a url
提问by Endling
I am trying to use python to change the hostname in a url, and have been playing around with the urlparse module for a while now without finding a satisfactory solution. As an example, consider the url:
我正在尝试使用 python 来更改 url 中的主机名,并且已经使用 urlparse 模块一段时间了,但没有找到令人满意的解决方案。例如,考虑以下网址:
https://www.google.dk:80/barbaz
https://www.google.dk:80/barbaz
I would like to replace "www.google.dk" with e.g. "www.foo.dk", so I get the following url:
我想用例如“www.foo.dk”替换“www.google.dk”,所以我得到以下网址:
So the part I want to replace is what urlparse.urlsplit refers to as hostname. I had hoped that the result of urlsplit would let me make changes, but the resulting type ParseResult doesn't allow me to. If nothing else I can of course reconstruct the new url by appending all the parts together with +, but this would leave me with some quite ugly code with a lot of conditionals to get "://" and ":" in the correct places.
所以我要替换的部分是 urlparse.urlsplit 所指的主机名。我曾希望 urlsplit 的结果能让我进行更改,但结果类型 ParseResult 不允许我这样做。如果没有别的,我当然可以通过将所有部分与 + 一起附加来重建新的 url,但这会给我留下一些非常丑陋的代码,其中包含很多条件以在正确的位置获取“://”和“:” .
采纳答案by Nigel Tufnel
You can use urllib.parse.urlparsefunction and ParseResult._replacemethod (Python 3):
您可以使用urllib.parse.urlparse函数和ParseResult._replace方法(Python 3):
>>> import urllib.parse
>>> parsed = urllib.parse.urlparse("https://www.google.dk:80/barbaz")
>>> replaced = parsed._replace(netloc="www.foo.dk:80")
>>> print(replaced)
ParseResult(scheme='https', netloc='www.foo.dk:80', path='/barbaz', params='', query='', fragment='')
If you're using Python 2, then replace urllib.parsewith urlparse.
如果您使用的是 Python 2,则替换urllib.parse为urlparse.
ParseResultis a subclass of namedtupleand _replaceis a namedtuplemethod that:
ParseResult是的一个子类namedtuple,并_replace为一个namedtuple方法是:
returns a new instance of the named tuple replacing specified fields with new values
返回命名元组的新实例,用新值替换指定字段
UPDATE:
更新:
As @2rs2ts said in the comment netlocattribute includes a port number.
正如@2rs2ts 在评论netloc属性中所说的,包括一个端口号。
Good news: ParseResulthas hostnameand portattributes.
Bad news: hostnameand portare not the members of namedtuple, they're dynamic properties and you can't do parsed._replace(hostname="www.foo.dk"). It'll throw an exception.
好消息:ParseResult有hostname和port属性。坏消息:hostname并且port不是 的成员namedtuple,它们是动态属性,您不能这样做parsed._replace(hostname="www.foo.dk")。它会抛出异常。
If you don't want to split on :and your url always has a port number and doesn't have usernameand password(that's urls like "https://username:[email protected]:80/barbaz") you can do:
如果您不想拆分:并且您的网址始终有一个端口号并且没有username和password(即像“ https://username:[email protected]:80/barbaz”这样的网址),您可以这样做:
parsed._replace(netloc="{}:{}".format(parsed.hostname, parsed.port))
回答by linkyndy
You can take advantage of urlsplitand urlunsplitfrom Python's urlparse:
您可以利用urlsplit和urlunsplit来自 Python 的urlparse:
>>> from urlparse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
As the docs state, the argument passed to urlunsplit()"can be any five-item iterable", so the above code works as expected.
正如文档所述,传递给urlunsplit()“可以是任何五项可迭代”的参数,因此上面的代码按预期工作。
回答by Alfe
To just replace the host without touching the port in use (if any), use this:
要在不接触正在使用的端口(如果有)的情况下更换主机,请使用以下命令:
import re, urlparse
p = list(urlparse.urlsplit('https://www.google.dk:80/barbaz'))
p[1] = re.sub('^[^:]*', 'www.foo.dk', p[1])
print urlparse.urlunsplit(p)
prints
印刷
https://www.foo.dk:80/barbaz
If you've not given any port, this works fine as well.
如果您没有提供任何端口,这也可以正常工作。
If you prefer the _replaceway Nigel pointed out, you can use this instead:
如果您更喜欢_replaceNigel 指出的方式,您可以使用它:
p = urlparse.urlsplit('https://www.google.dk:80/barbaz')
p = p._replace(netloc=re.sub('^[^:]*', 'www.foo.dk', p.netloc))
print urlparse.urlunsplit(p)
回答by Omid Raha
Using urlparseand urlunparsemethods of urlparsemodule:
模块的使用urlparse和urlunparse方法urlparse:
import urlparse
old_url = 'https://www.google.dk:80/barbaz'
url_lst = list(urlparse.urlparse(old_url))
# Now url_lst is ['https', 'www.google.dk:80', '/barbaz', '', '', '']
url_lst[1] = 'www.foo.dk:80'
# Now url_lst is ['https', 'www.foo.dk:80', '/barbaz', '', '', '']
new_url = urlparse.urlunparse(url_lst)
print(old_url)
print(new_url)
Output:
输出:
https://www.google.dk:80/barbaz
https://www.foo.dk:80/barbaz
回答by David Morley
A simple string replace of the host in the netloc also works in most cases:
在大多数情况下,netloc 中主机的简单字符串替换也适用:
>>> p = urlparse.urlparse('https://www.google.dk:80/barbaz')
>>> p._replace(netloc=p.netloc.replace(p.hostname, 'www.foo.dk')).geturl()
'https://www.foo.dk:80/barbaz'
This will not work if, by some chance, the user name or password matches the hostname. You cannot limit str.replace to replace the last occurrence only, so instead we can use split and join:
如果用户名或密码偶然与主机名匹配,这将不起作用。你不能限制 str.replace 只替换最后一次出现,所以我们可以使用 split 和 join :
>>> p = urlparse.urlparse('https://www.google.dk:[email protected]:80/barbaz')
>>> new_netloc = 'www.foo.dk'.join(p.netloc.rsplit(p.hostname, 1))
>>> p._replace(netloc=new_netloc).geturl()
'https://www.google.dk:[email protected]:80/barbaz'
回答by eLRuLL
I would recommend also using urlsplitand urlunsplitlike @linkyndy's answer, but for Python3it would be:
我也建议使用urlsplit并urlunsplit喜欢@linkyndy 的回答,但Python3它会是:
>>> from urllib.parse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
回答by Facundo Batista
You can always do this trick:
你总是可以这样做:
>>> p = parse.urlparse("https://stackoverflow.com/questions/21628852/changing-hostname-in-a-url")
>>> parse.ParseResult(**dict(p._asdict(), netloc='perrito.com.ar')).geturl()
'https://perrito.com.ar/questions/21628852/changing-hostname-in-a-url'

