在 Python 中验证 URL

Question

提问by mp94

I've been trying to figure out what the best way to validate a URL is (specifically in Python) but haven't really been able to find an answer. It seems like there isn't one known way to validate a URL, and it depends on what URLs you think you may need to validate. As well, I found it difficult to find an easy to read standard for URL structure. I did find the RFCs 3986 and 3987, but they contain much more than just how it is structured.

我一直在试图找出验证 URL 的最佳方法是什么（特别是在 Python 中），但还没有真正找到答案。似乎没有一种已知的方法可以验证 URL，这取决于您认为可能需要验证哪些 URL。同样，我发现很难找到一个易于阅读的 URL 结构标准。我确实找到了 RFC 3986 和 3987，但它们包含的不仅仅是它的结构。

Am I missing something, or is there no one standard way to validate a URL?

我是否遗漏了什么，或者没有一种标准的方法来验证 URL？

Answer 1

回答by bgschiller

This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?

这看起来可能是How do you validate a URL with a regular expression in Python?

You should be able to use the urlparselibrary described there.

您应该能够使用urlparse那里描述的库。

>>> from urllib.parse import urlparse # python2: from urlparse import urlparse
>>> urlparse('actually not a url')
ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')

call urlparseon the string you want to check and then make sure that the ParseResulthas attributes for schemeand netloc

调用urlparse要检查的字符串，然后确保ParseResult具有scheme和netloc

Answer 2

回答by mdw7326

Assuming you are using python 3, you could use urllib. The code would go something like this:

假设您使用的是 python 3，您可以使用 urllib。代码将是这样的：

import urllib.request as req
import urllib.parse as p

def foo():
    url = 'http://bar.com'
    request = req.Request(url)
    try:
        response = req.urlopen(request)
        #response is now a string you can search through containing the page's html
    except:
        #The url wasn't valid

If there is no error on the line "response = ..." then the url is valid.

如果“response = ...”行上没有错误，则该 url 有效。

Answer 3

回答by Tony Hammack

I would use the validators package. Here is the linkto the documentation and installation instructions.

我会使用验证器包。这是文档和安装说明的链接。

It is just as simple as

就这么简单

import validators
url = 'YOUR URL'
validators.url(url)

It will return true if it is, and false if not.

如果是，它将返回 true，否则返回 false。

Answer 4

回答by Hamza

you can also try using urllib.requestto validate by passing the URL in the urlopenfunction and catching the exception for URLError.

您还可以尝试urllib.request通过在urlopen函数中传递 URL并捕获URLError.

from urllib.request import urlopen, URLError

def validate_web_url(url="http://google"):
    try:
        urlopen(url)
        return True
    except URLError:
        return False

This would return Falsein this case

这将return False在这种情况下

Answer 5

回答by Chris Modzelewski

The original question is a bit old, but you might also want to look at the Validator-Collectionlibrary I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:

最初的问题有点旧，但您可能还想查看我几个月前发布的Validator-Collection库。它包括高性能的基于正则表达式的 URL 验证，以符合 RFC 标准。一些细节：

Tested against Python 2.7, 3.4, 3.5, 3.6, 3.7, and 3.8
No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy remodule)
Unit tests that cover 100+ different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.

针对 Python 2.7、3.4、3.5、3.6、3.7 和 3.8 进行测试
不依赖 Python 3.x，Python 2.x 中的一种条件依赖（Python 2.x 的错误re模块的直接替换）
单元测试涵盖 100 多种不同的成功/失败 URL 模式，包括非标准字符等。几乎涵盖了我所能找到的 RFC 标准的整个范围。

It's also very easy to use:

它也非常易于使用：

from validator_collection import validators, checkers

checkers.is_url('http://www.stackoverflow.com')
# Returns True

checkers.is_url('not a valid url')
# Returns False

value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'

value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'

value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'

In addition, Validator-Collectionincludes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.

此外，Validator-Collection包括大约 60 多个其他验证器，包括 IP 地址（IPv4 和 IPv6）、域和电子邮件地址，因此人们可能会觉得有用。

在 Python 中验证 URL

提问by mp94

回答by bgschiller

回答by mdw7326

回答by Tony Hammack

回答by Hamza

回答by Chris Modzelewski

相关推荐

最近更新

标签

在 Python 中验证 URL

提问by mp94

回答by bgschiller

回答by mdw7326

回答by Tony Hammack

回答by Hamza

回答by Chris Modzelewski

相关推荐

Python 如何将数据帧行分组到pandas groupby中的列表中？

Python 访问字典列表中的值

在 Python Pandas DataFrame 中将 timedelta64[ns] 列转换为秒

如何在 Python 中使用 tkinter 使用 GUI 编程计算器？

相关推荐

最近更新

标签