使用 urlparse (Python) 解析自定义 URI

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1417958/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:12:01  来源:igfitidea点击:

Parse custom URIs with urlparse (Python)

pythonurlpython-2.6urlparse

提问by u0b34a0f6ae

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

我的应用程序创建自定义 URI(或 URL?)来识别对象并解析它们。问题在于 Python 的 urlparse 模块拒绝像解析 http 那样解析未知的 URL 方案。

If I do not adjust urlparse's uses_* lists I get this:

如果我不调整 urlparse 的 uses_* 列表,我会得到这个:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

这是我所做的,我想知道是否有更好的方法来做到这一点:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

为什么没有更好的方法来做到这一点?

采纳答案by Ned Batchelder

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

我认为问题在于 URI 在方案之后并非都具有通用格式。例如,mailto: url 的结构与 http: url 不同。

I would use the results of the first parse, then synthesize an http url and parse it again:

我会使用第一次解析的结果,然后合成一个 http url 并再次解析它:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)

回答by toothygoose

You can also register a custom handler with urlparse:

您还可以使用 urlparse 注册自定义处理程序:

import urlparse

def register_scheme(scheme):
    for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

这会将您的 url 方案附加到列表中:

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

然后 uri 将被视为类似 http 并正确返回路径、片段、用户名/密码等。

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}

回答by sumid

There is also library called furlwhich gives you result you want:

还有一个名为furl 的库,它可以为您提供想要的结果:

>>>import furl
>>>f=furl.furl("qqqq://base/id#hint");
>>>f.scheme
'qqqq' 

>>> f.host
'base'  
>>> f.path
Path('/id')
>>>  f.path.segments
['id']
>>> f.fragment                                                                                                                                                                                                                                                                 
Fragment('hint')   
>>> f.fragmentstr                                                                                                                                                                                                                                                              
'hint'

回答by OrangeDog

The question appears to be out of date. Since at least Python 2.7 there are no issues.

这个问题似乎已经过时了。由于至少 Python 2.7 没有问题。

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
>>> import urlparse
>>> urlparse.urlparse("qqqq://base/id#hint")
ParseResult(scheme='qqqq', netloc='base', path='/id', params='', query='', fragment='hint')

回答by Joe Crobak

Try removing the scheme entirely, and start with //netloc, i.e.:

尝试完全删除该方案,并从 //netloc 开始,即:

>>> SCHEME="qqqq"
>>> url="qqqq://base/id#hint"[len(SCHEME)+1:]
>>> url
'//base/id#hint'
>>> urlparse.urlparse(url)
('', 'base', '/id', '', '', 'hint')

You won't have the scheme in the urlparse result, but you know the scheme anyway.

您不会在 urlparse 结果中包含该方案,但无论如何您都知道该方案。

Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):

另请注意,Python 2.6 似乎可以很好地处理这个 url(除了片段):

$ python2.6 -c 'import urlparse; print urlparse.urlparse("qqqq://base/id#hint")'
ParseResult(scheme='qqqq', netloc='base', path='/id#hint', params='', query='', fragment='')

回答by homm

You can use yurllibrary. Unlike purl or furl, it not try to fix urlparse bugs. It is new compatible with RFC 3986 implementation.

您可以使用yurl库。与 purl 或 furl 不同,它不会尝试修复 urlparse 错误。它与 RFC 3986 实现兼容。

>>> import yurl
>>> yurl.URL('qqqq://base/id#hint')
URLBase(scheme='qqqq', userinfo=u'', host='base', port='', path='/id', query='', fragment='hint')