在 Python 中验证 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22238090/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:32:35  来源:igfitidea点击:

Validating URLs in Python

pythonurlurl-validation

提问by mp94

I've been trying to figure out what the best way to validate a URL is (specifically in Python) but haven't really been able to find an answer. It seems like there isn't one known way to validate a URL, and it depends on what URLs you think you may need to validate. As well, I found it difficult to find an easy to read standard for URL structure. I did find the RFCs 3986 and 3987, but they contain much more than just how it is structured.

我一直在试图找出验证 URL 的最佳方法是什么(特别是在 Python 中),但还没有真正找到答案。似乎没有一种已知的方法可以验证 URL,这取决于您认为可能需要验证哪些 URL。同样,我发现很难找到一个易于阅读的 URL 结构标准。我确实找到了 RFC 3986 和 3987,但它们包含的不仅仅是它的结构。

Am I missing something, or is there no one standard way to validate a URL?

我是否遗漏了什么,或者没有一种标准的方法来验证 URL?

回答by bgschiller

This looks like it might be a duplicate of How do you validate a URL with a regular expression in Python?

这看起来可能是How do you validate a URL with a regular expression in Python?

You should be able to use the urlparselibrary described there.

您应该能够使用urlparse那里描述的库。

>>> from urllib.parse import urlparse # python2: from urlparse import urlparse
>>> urlparse('actually not a url')
ParseResult(scheme='', netloc='', path='actually not a url', params='', query='', fragment='')
>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')

call urlparseon the string you want to check and then make sure that the ParseResulthas attributes for schemeand netloc

调用urlparse要检查的字符串,然后确保ParseResult具有schemenetloc

回答by mdw7326

Assuming you are using python 3, you could use urllib. The code would go something like this:

假设您使用的是 python 3,您可以使用 urllib。代码将是这样的:

import urllib.request as req
import urllib.parse as p

def foo():
    url = 'http://bar.com'
    request = req.Request(url)
    try:
        response = req.urlopen(request)
        #response is now a string you can search through containing the page's html
    except:
        #The url wasn't valid

If there is no error on the line "response = ..." then the url is valid.

如果“response = ...”行上没有错误,则该 url 有效。

回答by Tony Hammack

I would use the validators package. Here is the linkto the documentation and installation instructions.

我会使用验证器包。这是文档和安装说明的链接

It is just as simple as

就这么简单

import validators
url = 'YOUR URL'
validators.url(url)

It will return true if it is, and false if not.

如果是,它将返回 true,否则返回 false。

回答by Hamza

you can also try using urllib.requestto validate by passing the URL in the urlopenfunction and catching the exception for URLError.

您还可以尝试urllib.request通过在urlopen函数中传递 URL并捕获URLError.

from urllib.request import urlopen, URLError

def validate_web_url(url="http://google"):
    try:
        urlopen(url)
        return True
    except URLError:
        return False

This would return Falsein this case

这将return False在这种情况下

回答by Chris Modzelewski

The original question is a bit old, but you might also want to look at the Validator-Collectionlibrary I released a few months back. It includes high-performing regex-based validation of URLs for compliance against the RFC standard. Some details:

最初的问题有点旧,但您可能还想查看我几个月前发布的Validator-Collection库。它包括高性能的基于正则表达式的 URL 验证,以符合 RFC 标准。一些细节:

  • Tested against Python 2.7, 3.4, 3.5, 3.6, 3.7, and 3.8
  • No dependencies on Python 3.x, one conditional dependency in Python 2.x (drop-in replacement for Python 2.x's buggy remodule)
  • Unit tests that cover 100+ different succeeding/failing URL patterns, including non-standard characters and the like. As close to covering the whole spectrum of the RFC standard as I've been able to find.
  • 针对 Python 2.7、3.4、3.5、3.6、3.7 和 3.8 进行测试
  • 不依赖 Python 3.x,Python 2.x 中的一种条件依赖(Python 2.x 的错误re模块的直接替换)
  • 单元测试涵盖 100 多种不同的成功/失败 URL 模式,包括非标准字符等。几乎涵盖了我所能找到的 RFC 标准的整个范围。

It's also very easy to use:

它也非常易于使用:

from validator_collection import validators, checkers

checkers.is_url('http://www.stackoverflow.com')
# Returns True

checkers.is_url('not a valid url')
# Returns False

value = validators.url('http://www.stackoverflow.com')
# value set to 'http://www.stackoverflow.com'

value = validators.url('not a valid url')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('https://123.12.34.56:1234')
# value set to 'https://123.12.34.56:1234'

value = validators.url('http://10.0.0.1')
# raises a validator_collection.errors.InvalidURLError (which is a ValueError)

value = validators.url('http://10.0.0.1', allow_special_ips = True)
# value set to 'http://10.0.0.1'

In addition, Validator-Collectionincludes about 60+ other validators, including IP addresses (IPv4 and IPv6), domains, and email addresses as well, so something folks might find useful.

此外,Validator-Collection包括大约 60 多个其他验证器,包括 IP 地址(IPv4 和 IPv6)、域和电子邮件地址,因此人们可能会觉得有用。