使用 urlparse (Python) 解析自定义 URI

Question

提问by u0b34a0f6ae

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

我的应用程序创建自定义 URI（或 URL？）来识别对象并解析它们。问题在于 Python 的 urlparse 模块拒绝像解析 http 那样解析未知的 URL 方案。

If I do not adjust urlparse's uses_* lists I get this:

如果我不调整 urlparse 的 uses_* 列表，我会得到这个：

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

这是我所做的，我想知道是否有更好的方法来做到这一点：

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

为什么没有更好的方法来做到这一点？

Answer 1

采纳答案by Ned Batchelder

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

我认为问题在于 URI 在方案之后并非都具有通用格式。例如，mailto: url 的结构与 http: url 不同。

I would use the results of the first parse, then synthesize an http url and parse it again:

我会使用第一次解析的结果，然后合成一个 http url 并再次解析它：

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)

Answer 2

回答by toothygoose

You can also register a custom handler with urlparse:

您还可以使用 urlparse 注册自定义处理程序：

import urlparse

def register_scheme(scheme):
    for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

这会将您的 url 方案附加到列表中：

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

然后 uri 将被视为类似 http 并正确返回路径、片段、用户名/密码等。

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}

Answer 3

回答by sumid

There is also library called furlwhich gives you result you want:

还有一个名为furl 的库，它可以为您提供想要的结果：

>>>import furl
>>>f=furl.furl("qqqq://base/id#hint");
>>>f.scheme
'qqqq' 

>>> f.host
'base'  
>>> f.path
Path('/id')
>>>  f.path.segments
['id']
>>> f.fragment                                                                                                                                                                                                                                                                 
Fragment('hint')   
>>> f.fragmentstr                                                                                                                                                                                                                                                              
'hint'

Answer 4

回答by OrangeDog

The question appears to be out of date. Since at least Python 2.7 there are no issues.

这个问题似乎已经过时了。由于至少 Python 2.7 没有问题。

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
>>> import urlparse
>>> urlparse.urlparse("qqqq://base/id#hint")
ParseResult(scheme='qqqq', netloc='base', path='/id', params='', query='', fragment='hint')

Answer 5

回答by Joe Crobak

Try removing the scheme entirely, and start with //netloc, i.e.:

尝试完全删除该方案，并从 //netloc 开始，即：

>>> SCHEME="qqqq"
>>> url="qqqq://base/id#hint"[len(SCHEME)+1:]
>>> url
'//base/id#hint'
>>> urlparse.urlparse(url)
('', 'base', '/id', '', '', 'hint')

You won't have the scheme in the urlparse result, but you know the scheme anyway.

您不会在 urlparse 结果中包含该方案，但无论如何您都知道该方案。

Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):

另请注意，Python 2.6 似乎可以很好地处理这个 url（除了片段）：

$ python2.6 -c 'import urlparse; print urlparse.urlparse("qqqq://base/id#hint")'
ParseResult(scheme='qqqq', netloc='base', path='/id#hint', params='', query='', fragment='')

Answer 6

回答by homm

You can use yurllibrary. Unlike purl or furl, it not try to fix urlparse bugs. It is new compatible with RFC 3986 implementation.

您可以使用yurl库。与 purl 或 furl 不同，它不会尝试修复 urlparse 错误。它与 RFC 3986 实现兼容。

>>> import yurl
>>> yurl.URL('qqqq://base/id#hint')
URLBase(scheme='qqqq', userinfo=u'', host='base', port='', path='/id', query='', fragment='hint')

使用 urlparse (Python) 解析自定义 URI

提问by u0b34a0f6ae

采纳答案by Ned Batchelder

回答by toothygoose

回答by sumid

回答by OrangeDog

回答by Joe Crobak

回答by homm

相关推荐

最近更新

标签

使用 urlparse (Python) 解析自定义 URI

提问by u0b34a0f6ae

采纳答案by Ned Batchelder

回答by toothygoose

回答by sumid

回答by OrangeDog

回答by Joe Crobak

回答by homm

相关推荐

我如何对 Python 包进行 Debian 打包？

python smtp gmail身份验证错误（通过gmail smtp服务器发送电子邮件）

python 如何检查小部件在 Tkinter 中是否具有焦点？

python Django：select_related 与 ManyToManyField

相关推荐

最近更新

标签