如何使用正则表达式和 PHP 验证域名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3026957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to validate a domain name using Regex & Php?
提问by Ryan
I want a solution to validate only domain names not full urls, The following example is what i'm looking for:
我想要一个仅验证域名而不是完整网址的解决方案,以下示例是我正在寻找的:
domain.com -> true
domain.net -> true
domain.org -> true
domain.biz -> true
domain.co.uk -> true
sub.domain.com -> true
domain.com/folder -> false
domμ*$ain.com -> false
回答by Onur Y?ld?r?m
The accepted answer is incomplete/wrong.
接受的答案不完整/错误。
The regex pattern;
正则表达式模式;
should NOTvalidate domains such as:
-domain.com,domain--.com,-domain-.-.com,domain.000, etc...shouldvalidate domains such as:
schools.k12,newTLD.clothing,good.photography, etc...
不应该验证领域,如:
-domain.com,domain--.com,-domain-.-.com,domain.000,等...要验证域名如:
schools.k12,newTLD.clothing,good.photography,等...
After some further research; below is the most correct, cross-language and compact pattern I could come up with:
经过一些进一步的研究;下面是我能想到的最正确、跨语言和紧凑的模式:
^(?!\-)(?:(?:[a-zA-Z\d][a-zA-Z\d\-]{0,61})?[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$
This pattern conforms with most* of the rules defined in the specs:
此模式符合规范中定义的大多数*规则:
- Each label/level (splitted by a dot) may contain up to 63 characters.
- The full domain name may have up to 127 levels.
- The full domain name may not exceed the length of 253 charactersin its textual representation.
- Each label can consist of letters, digits and hyphens.
- Labels cannot startor endwith a hyphen.
- The top-level domain (extension) cannot be all-numeric.
- 每个标签/级别(由一个点分隔)最多可包含63 个字符。
- 完整的域名最多可以有127 个级别。
- 完整的域名在其文本表示中不得超过253 个字符的长度。
- 每个标签可以由字母、数字和连字符组成。
- 标签不能以连字符开头或结尾。
- 顶级域(扩展名)不能是全数字的。
Note 1: The full domain length check is not included in the regex. It should be simply checked by native methods e.g. strlen(domain) <= 253.
Note 2: This pattern works with most languages including PHP, Javascript, Python, etc...
注 1:正则表达式中不包括完整的域长度检查。它应该通过本地方法简单地检查,例如strlen(domain) <= 253.
注 2:此模式适用于大多数语言,包括 PHP、Javascript、Python 等...
See DEMO here(for JS, PHP, Python)
More Info:
更多信息:
The regex above does not support IDNs.
There is no spec that says the extension (TLD) should be between 2 and 6 characters. It actually supports 63 characters. See the current TLD listhere. Also, some networks do internally use custom/pseudo TLDs.
Registration authorities might impose some extra, specific ruleswhich are not explicitly supported in this regex. For example,
.CO.UKand.ORG.UKmust have at least 3 characters, but less than 23, not including the extension. These kinds of rules are non-standard and subject to change. Do not implement them if you cannot maintain.Regular Expressions are great but not the best effective, performant solution to every problem. So a native URL parser should be used instead, whenever possible. e.g. Python's
urlparse()method or PHP'sparse_url()method...After all, this is just a format validation. A regex test does not confirm that a domain name is actually configured/exists! You should test the existence by making a request.
上面的正则表达式不支持IDN。
没有规范说扩展名 (TLD) 应该在 2 到 6 个字符之间。它实际上支持 63 个字符。在此处查看当前TLD 列表。此外,一些网络确实在内部使用自定义/伪 TLD。
注册机构可能会强加一些额外的、特定的规则,这些规则在这个正则表达式中没有明确支持。例如,
.CO.UK并且.ORG.UK必须至少有 3 个字符,但少于 23 个,不包括扩展名。这些类型的规则是非标准的,可能会发生变化。如果您无法维护,请不要实施它们。正则表达式很棒,但不是解决所有问题的最有效、最高效的解决方案。因此,应尽可能使用本机 URL 解析器。例如 Python 的
urlparse()方法或 PHP 的parse_url()方法...毕竟,这只是一种格式验证。正则表达式测试无法确认域名实际配置/存在!您应该通过提出请求来测试是否存在。
Specs & References:
规格和参考:
UPDATE(2019-12-21): Fixed leading hyphen with subdomains.
更新(2019-12-21):修复了带子域的前导连字符。
回答by zildjohn01
How about:
怎么样:
^(?:[-A-Za-z0-9]+\.)+[A-Za-z]{2,6}$
回答by Web_Developer
In my case, domain name is considered as valid if the format is stackoverflow.com or xxx.stackoverflow.com
就我而言,如果格式为 stackoverflow.com 或 xxx.stackoverflow.com,则域名被视为有效
So in addition to other stack answers, I have added checking for www. also.
因此,除了其他堆栈答案之外,我还添加了对 www 的检查。还。
function isValidDomainName($domain) {
if (filter_var(gethostbyname($domain), FILTER_VALIDATE_IP)) {
return (preg_match('/^www./', $domain)) ? FALSE : TRUE;
}
return FALSE;
}
you can test the function with this code
您可以使用此代码测试该功能
$domain = array("http://www.domain.com","http://www.domain.com/folder" ,"http://domain.com", "www.domain.com", "domain.com/subfolder", "domain.com","sub.domain.com");
foreach ($domain as $v) {
echo isValidDomainName($v) ? "{$v} is valid<br>" : "{$v} is invalid<br>";
}
回答by rikworkshop.com
Please try this expression:
请试试这个表达:
^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$
What it actually does
它实际上做了什么
- optional http/s://
- optional www
- any valid alphanumeric name (including - and _)
- 1 or 2 occurrences of any valid alphanumeric name (including - and _)
- 可选 http/s://
- 可选的 www
- 任何有效的字母数字名称(包括 - 和 _)
- 1 或 2 次任何有效的字母数字名称(包括 - 和 _)
Validation Examples
验证示例
- http://www.test.com
- test.com.mt
- http://www.test.com
- test.com.mt
回答by Nicholas English
I made a function to validate the domain name without any regex.
我做了一个函数来验证没有任何正则表达式的域名。
<?php
function validDomain($domain) {
$domain = rtrim($domain, '.');
if (!mb_stripos($domain, '.')) {
return false;
}
$domain = explode('.', $domain);
$allowedChars = array('-');
$extenion = array_pop($domain);
foreach ($domain as $value) {
$fc = mb_substr($value, 0, 1);
$lc = mb_substr($value, -1);
if (
hash_equals($value, '')
|| in_array($fc, $allowedChars)
|| in_array($lc, $allowedChars)
) {
return false;
}
if (!ctype_alnum(str_replace($allowedChars, '', $value))) {
return false;
}
}
if (
!ctype_alnum(str_replace($allowedChars, '', $extenion))
|| hash_equals($extenion, '')
) {
return false;
}
return true;
}
$testCases = array(
'a',
'0',
'a.b',
'google.com',
'news.google.co.uk',
'xn--fsqu00a.xn--0zwm56d',
'google.com ',
'google.com.',
'goo gle.com',
'a.',
'hey.hey',
'google-.com',
'-nj--9*.vom',
' ',
'..',
'google..com',
'www.google.com',
'www.google.com/some/path/to/dir/'
);
foreach ($testCases as $testCase) {
var_dump($testCase);
var_dump(validDomain($TestCase));
echo '<br /><br />';
}
?>
This code outputs:
此代码输出:
string(1) "a" bool(false)
string(1) "0" bool(false)
string(3) "a.b" bool(true)
string(10) "google.com" bool(true)
string(17) "news.google.co.uk" bool(true)
string(23) "xn--fsqu00a.xn--0zwm56d" bool(true)
string(11) "google.com " bool(false)
string(11) "google.com." bool(true)
string(11) "goo gle.com" bool(false)
string(2) "a." bool(false)
string(7) "hey.hey" bool(true)
string(11) "google-.com" bool(false)
string(11) "-nj--9*.vom" bool(false)
string(1) " " bool(false)
string(2) ".." bool(false)
string(11) "google..com" bool(false)
string(14) "www.google.com" bool(true)
string(32) "www.google.com/some/path/to/dir/" bool(false)
string(1) "a" bool(false)
string(1) "0" bool(false)
string(3) "ab" bool(true)
string(10) "google.com" bool(true)
string(17) "news.google.co.uk" bool(true)
string(23) "xn--fsqu00a.xn--0zwm56d" bool(true)
string(11) "google.com " bool(false)
string(11) " google.com。” bool(true)
string(11) "goo gle.com" bool(false)
string(2) "a." bool(false)
string(7) "hey.hey" bool(true)
string(11) "google-.com" bool(false)
string(11) "-nj--9*.vom" bool(false)
string (1) " " bool(false)
string(2) ".."
字符串(32)“www.google.com/some/path/to/dir/” bool(假)
I hope I have covered everything if I missed something please tell me and I can improve this function. :)
我希望我已经涵盖了所有内容,如果我错过了什么请告诉我,我可以改进这个功能。:)
回答by Charles
Remember, regexes can only check to see if something is well formed. "www.idonotexistbecauseiammadeuponthespot.com" is well-formed, but doesn't actually exist... at the time of writing. ;) Furthermore, certain free web hosting providers (like Tripod) allow underscores in subdomains. This is clearly a violation of the RFCs, yet it sometimes works.
请记住,正则表达式只能检查某些内容是否格式正确。“www.idonotexistbecauseiammadeuponthespot.com”格式正确,但实际上并不存在......在撰写本文时。;) 此外,某些免费的网络托管服务提供商(如 Tripod)允许在子域中使用下划线。这显然违反了 RFC,但它有时会起作用。
Do you want to check if the domain exists? Try dns_get_recordinstead of (just) a regex.
是否要检查域是否存在?尝试使用dns_get_record而不是(只是)正则表达式。

