.net 用于验证 URI 的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30847/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 09:41:02  来源:igfitidea点击:

Regex to validate URIs

.netregex

提问by alumb

How do you produce a regex that matches only valid URI. The description for URIs can be found here: http://en.wikipedia.org/wiki/URI_scheme. It doesn't need to extract any parts, just test if a URI is valid.

您如何生成仅匹配有效 URI 的正则表达式。URI 的描述可以在这里找到:http: //en.wikipedia.org/wiki/URI_scheme。它不需要提取任何部分,只需测试 URI 是否有效。

(preferred format is .Net RegularExpression) (.Net Version 1.1)

(首选格式是 .Net RegularExpression)(.Net 1.1 版)

  • Doesn't neet to check for a known protocol, just a valid one.
  • 不需要检查已知协议,只需检查有效协议。

Current Solution:

当前解决方案:

^([a-zA-Z0-9+.-]+):(//([a-zA-Z0-9-._~!$&'()*+,;=:]*)@)?([a-zA-Z0-9-._~!$&'()*+,;=]+)(:(\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$(:(\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(\#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$

采纳答案by Daren Thomas

This site looks promising: http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

这个网站看起来很有希望:http: //snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/

They propose following regex:

他们提出以下正则表达式:

/^([a-z0-9+.-]+):(?://(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*)@)?((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*)(?::(\d*))?(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?|(/?(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})+(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?$/i

回答by bdukes

Does Uri.IsWellFormedUriStringwork for you?

是否Uri.IsWellFormedUriString为你工作?

回答by jcsahnwaldt says GoFundMonica

The URI specification says:

URI规范说

The following line is the regular expression for breaking-down a well-formed URI reference into its components.

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

以下行是将格式良好的 URI 引用分解为其组件的正则表达式。

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

(I guess that's the same regex as in the STD66 link given in another answer.)

(我想这与另一个答案中给出的 STD66 链接中的正则表达式相同。)

But breaking-downis not validating. To correctly validate a URI, one would have to translate the BNF for URIsto a regex. While some BNFs cannotbe expressed as regular expressions, I think with this one it couldbe done. But it shouldn't be done - it would be a huge mess. It's better to use a library function.

故障不是验证. 要正确验证 URI,必须将URIBNF转换正则表达式。虽然一些BNFs不能表示为正则表达式,我觉得这一个就可以完成。但这不应该这样做 - 这将是一个巨大的混乱。最好使用库函数。

回答by papercowboy

The best and most definitive guide to this I have found is here: http://jmrware.com/articles/2009/uri_regexp/URI_regex.html(In answer to your question, see the URItable entry)

我找到的最好和最权威的指南在这里:http: //jmrware.com/articles/2009/uri_regexp/URI_regex.html(要回答你的问题,请参阅URI表条目)

All of these rules from RFC3986 are reproduced in Table 2 along with a regular expression implementation for each rule.

表 2 中复制了 RFC3986 中的所有这些规则以及每个规则的正则表达式实现。

A javascript implementation of this is available here: https://github.com/jhermsmeier/uri.regex

此处提供了一个 javascript 实现:https: //github.com/jhermsmeier/uri.regex

For reference, the URI regex is repeated below:

作为参考,URI 正则表达式在下面重复:

# RFC-3986 URI component:  URI
[A-Za-z][A-Za-z0-9+\-.]* :                                      # scheme ":"
(?: //                                                          # hier-part
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:)    {6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:)    {5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "?" query ]
(?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "#" fragment ]

回答by Lostfields

The best regex I came up with according to RFC 3986 (https://tools.ietf.org/html/rfc3986) was the following:

我根据 RFC 3986 ( https://tools.ietf.org/html/rfc3986)提出的最佳正则表达式如下:

Flow diagram of regex using https://regexper.com

使用 https://regexper.com 的正则表达式流程图

// named groups
/^(?<scheme>[a-z][a-z0-9+.-]+):(?<authority>\/\/(?<user>[^@]+@)?(?<host>[a-z0-9.\-_~]+)(?<port>:\d+)?)?(?<path>(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])*)*|(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+)*)?(?<query>\?(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?(?<fragment>\#(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?$/i

// unnamed groups
/^([a-z][a-z0-9+.-]+):(\/\/([^@]+@)?([a-z0-9.\-_~]+)(:\d+)?)?((?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])*)*|(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+)*)?(\?(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?(\#(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?$/i

capture groups

捕获组

  1. scheme
  2. authority
  3. userinfo
  4. host
  5. port
  6. path
  7. query
  8. fragment
  1. 方案
  2. 权威
  3. 用户信息
  4. 主持人
  5. 港口
  6. 小路
  7. 询问
  8. 分段

回答by Mark Biek

Are there some specific URIs you care about or are you trying to find a single regex that validates STD66?

是否有一些您关心的特定 URI,或者您是否试图找到一个验证STD66 的正则表达式?

I was going to point you to this regexfor parsing a URI. You could then, in theory, check to see if all of the elements you care about are there.

我将指向您使用此正则表达式来解析 URI。然后,理论上,您可以检查您关心的所有元素是否都存在。

But I think bdukesanswer is better.

但我认为bdukes 的回答更好。