如何使用正则表达式和 PHP 验证域名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3026957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:28:06  来源:igfitidea点击:

How to validate a domain name using Regex & Php?

phpregexpreg-match

提问by Ryan

I want a solution to validate only domain names not full urls, The following example is what i'm looking for:

我想要一个仅验证域名而不是完整网址的解决方案,以下示例是我正在寻找的:

domain.com -> true
domain.net -> true
domain.org -> true
domain.biz -> true
domain.co.uk -> true
sub.domain.com -> true
domain.com/folder -> false
domμ*$ain.com -> false

回答by Onur Y?ld?r?m

The accepted answer is incomplete/wrong.

接受的答案不完整/错误。

The regex pattern;

正则表达式模式;

  • should NOTvalidate domains such as:
    -domain.com, domain--.com, -domain-.-.com, domain.000, etc...

  • shouldvalidate domains such as:
    schools.k12, newTLD.clothing, good.photography, etc...

  • 不应该验证领域,如:
    -domain.comdomain--.com-domain-.-.comdomain.000,等...

  • 验证域名如:
    schools.k12newTLD.clothinggood.photography,等...

After some further research; below is the most correct, cross-language and compact pattern I could come up with:

经过一些进一步的研究;下面是我能想到的最正确、跨语言和紧凑的模式:

^(?!\-)(?:(?:[a-zA-Z\d][a-zA-Z\d\-]{0,61})?[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$

This pattern conforms with most* of the rules defined in the specs:

此模式符合规范中定义的大多数*规则:

  • Each label/level (splitted by a dot) may contain up to 63 characters.
  • The full domain name may have up to 127 levels.
  • The full domain name may not exceed the length of 253 charactersin its textual representation.
  • Each label can consist of letters, digits and hyphens.
  • Labels cannot startor endwith a hyphen.
  • The top-level domain (extension) cannot be all-numeric.
  • 每个标签/级别(由一个点分隔)最多可包含63 个字符
  • 完整的域名最多可以有127 个级别
  • 完整的域名在其文本表示中不得超过253 个字符的长度。
  • 每个标签可以由字母、数字和连字符组成
  • 标签不能以连字符开头结尾
  • 顶级域(扩展名)不能是全数字的

Note 1: The full domain length check is not included in the regex. It should be simply checked by native methods e.g. strlen(domain) <= 253.
Note 2: This pattern works with most languages including PHP, Javascript, Python, etc...

注 1:正则表达式中不包括完整的域长度检查。它应该通过本地方法简单地检查,例如strlen(domain) <= 253.
注 2:此模式适用于大多数语言,包括 PHP、Javascript、Python 等...

See DEMO here(for JS, PHP, Python)

在此处查看演示(适用于 JS、PHP、Python)

More Info:

更多信息:

  • The regex above does not support IDNs.

  • There is no spec that says the extension (TLD) should be between 2 and 6 characters. It actually supports 63 characters. See the current TLD listhere. Also, some networks do internally use custom/pseudo TLDs.

  • Registration authorities might impose some extra, specific ruleswhich are not explicitly supported in this regex. For example, .CO.UKand .ORG.UKmust have at least 3 characters, but less than 23, not including the extension. These kinds of rules are non-standard and subject to change. Do not implement them if you cannot maintain.

  • Regular Expressions are great but not the best effective, performant solution to every problem. So a native URL parser should be used instead, whenever possible. e.g. Python's urlparse()method or PHP's parse_url()method...

  • After all, this is just a format validation. A regex test does not confirm that a domain name is actually configured/exists! You should test the existence by making a request.

  • 上面的正则表达式不支持IDN

  • 没有规范说扩展名 (TLD) 应该在 2 到 6 个字符之间。它实际上支持 63 个字符。在此处查看当前TLD 列表。此外,一些网络确实在内部使用自定义/伪 TLD。

  • 注册机构可能会强加一些额外的、特定的规则这些规则在这个正则表达式中没有明确支持。例如,.CO.UK并且.ORG.UK必须至少有 3 个字符,但少于 23 个,不包括扩展名。这些类型的规则是非标准的,可能会发生变化。如果您无法维护,请不要实施它们。

  • 正则表达式很棒,但不是解决所有问题的最有效、最高效的解决方案。因此,应尽可能使用本机 URL 解析器。例如 Python 的urlparse()方法或 PHP 的parse_url()方法...

  • 毕竟,这只是一种格式验证。正则表达式测试无法确认域名实际配置/存在!您应该通过提出请求来测试是否存在。

Specs & References:

规格和参考:

UPDATE(2019-12-21): Fixed leading hyphen with subdomains.

更新(2019-12-21):修复了带子域的前导连字符。

回答by zildjohn01

How about:

怎么样:

^(?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6}$

回答by Web_Developer

In my case, domain name is considered as valid if the format is stackoverflow.com or xxx.stackoverflow.com

就我而言,如果格式为 stackoverflow.com 或 xxx.stackoverflow.com,则域名被视为有效

So in addition to other stack answers, I have added checking for www. also.

因此,除了其他堆栈答案之外,我还添加了对 www 的检查。还。

function isValidDomainName($domain) {
  if (filter_var(gethostbyname($domain), FILTER_VALIDATE_IP)) {
      return (preg_match('/^www./', $domain)) ? FALSE : TRUE;
  }
  return FALSE;
}

you can test the function with this code

您可以使用此代码测试该功能

    $domain = array("http://www.domain.com","http://www.domain.com/folder" ,"http://domain.com", "www.domain.com", "domain.com/subfolder", "domain.com","sub.domain.com");
    foreach ($domain as $v) {
        echo isValidDomainName($v) ? "{$v} is valid<br>" : "{$v} is invalid<br>";
    }

回答by rikworkshop.com

Please try this expression:

请试试这个表达:

^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$

What it actually does

它实际上做了什么

  • optional http/s://
  • optional www
  • any valid alphanumeric name (including - and _)
  • 1 or 2 occurrences of any valid alphanumeric name (including - and _)
  • 可选 http/s://
  • 可选的 www
  • 任何有效的字母数字名称(包括 - 和 _)
  • 1 或 2 次任何有效的字母数字名称(包括 - 和 _)

Validation Examples

验证示例

回答by Nicholas English

I made a function to validate the domain name without any regex.

我做了一个函数来验证没有任何正则表达式的域名。

<?php
function validDomain($domain) {
  $domain = rtrim($domain, '.');
  if (!mb_stripos($domain, '.')) {
    return false;
  }
  $domain = explode('.', $domain);
  $allowedChars = array('-');
  $extenion = array_pop($domain);
  foreach ($domain as $value) {
    $fc = mb_substr($value, 0, 1);
    $lc = mb_substr($value, -1);
    if (
      hash_equals($value, '')
      || in_array($fc, $allowedChars)
      || in_array($lc, $allowedChars)
    ) {
      return false;
    }
    if (!ctype_alnum(str_replace($allowedChars, '', $value))) {
      return false;
    }
  }
  if (
    !ctype_alnum(str_replace($allowedChars, '', $extenion))
    || hash_equals($extenion, '')
  ) {
    return false;
  }
  return true;
}
$testCases = array(
  'a',
  '0',
  'a.b',
  'google.com',
  'news.google.co.uk',
  'xn--fsqu00a.xn--0zwm56d',
  'google.com ',
  'google.com.',
  'goo gle.com',
  'a.',
  'hey.hey',
  'google-.com',
  '-nj--9*.vom',
  ' ',
  '..',
  'google..com',
  'www.google.com',
  'www.google.com/some/path/to/dir/'
);
foreach ($testCases as $testCase) {
  var_dump($testCase);
  var_dump(validDomain($TestCase));
  echo '<br /><br />';
}
?>

This code outputs:

此代码输出:

string(1) "a" bool(false)

string(1) "0" bool(false)

string(3) "a.b" bool(true)

string(10) "google.com" bool(true)

string(17) "news.google.co.uk" bool(true)

string(23) "xn--fsqu00a.xn--0zwm56d" bool(true)

string(11) "google.com " bool(false)

string(11) "google.com." bool(true)

string(11) "goo gle.com" bool(false)

string(2) "a." bool(false)

string(7) "hey.hey" bool(true)

string(11) "google-.com" bool(false)

string(11) "-nj--9*.vom" bool(false)

string(1) " " bool(false)

string(2) ".." bool(false)

string(11) "google..com" bool(false)

string(14) "www.google.com" bool(true)

string(32) "www.google.com/some/path/to/dir/" bool(false)

string(1) "a" bool(false)

string(1) "0" bool(false)

string(3) "ab" bool(true)

string(10) "google.com" bool(true)

string(17) "news.google.co.uk" bool(true)

string(23) "xn--fsqu00a.xn--0zwm56d" bool(true)

string(11) "google.com " bool(false)

string(11) " google.com。” bool(true)

string(11) "goo gle.com" bool(false)

string(2) "a." bool(false)

string(7) "hey.hey" bool(true)

string(11) "google-.com" bool(false)

string(11) "-nj--9*.vom" bool(false)

string (1) " " bool(false)

string(2) ".."





字符串(32)“www.google.com/some/path/to/dir/” bool(假)

I hope I have covered everything if I missed something please tell me and I can improve this function. :)

我希望我已经涵盖了所有内容,如果我错过了什么请告诉我,我可以改进这个功能。:)

回答by Charles

Remember, regexes can only check to see if something is well formed. "www.idonotexistbecauseiammadeuponthespot.com" is well-formed, but doesn't actually exist... at the time of writing. ;) Furthermore, certain free web hosting providers (like Tripod) allow underscores in subdomains. This is clearly a violation of the RFCs, yet it sometimes works.

请记住,正则表达式只能检查某些内容是否格式正确。“www.idonotexistbecauseiammadeuponthespot.com”格式正确,但实际上并不存在......在撰写本文时。;) 此外,某些免费的网络托管服务提供商(如 Tripod)允许在子域中使用下划线。这显然违反了 RFC,但它有时会起作用。

Do you want to check if the domain exists? Try dns_get_recordinstead of (just) a regex.

是否要检查域是否存在?尝试使用dns_get_record而不是(只是)正则表达式。