如何使用 PHP 从 URL 获取基本域名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3211411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the base domain name from an URL using PHP?
提问by Rohan
I need to get the domain name from an URL. The following examples should all return google.com:
我需要从 URL 获取域名。以下示例都应返回google.com:
google.com
images.google.com
new.images.google.com
www.google.com
Similarly the following URLs should all return google.co.uk.
同样,以下 URL 都应返回google.co.uk.
google.co.uk
images.google.co.uk
new.images.google.co.uk
http://www.google.co.uk
I'm hesitant to use Regular Expressions, because something like domain.com/google.comcould return incorrect results.
我对使用正则表达式犹豫不决,因为类似的东西domain.com/google.com可能会返回不正确的结果。
How can I get the top-level domain, using PHP? This needs to work on all platforms and hosts.
如何使用 PHP 获取顶级域?这需要适用于所有平台和主机。
回答by xil3
You could do this:
你可以这样做:
$urlData = parse_url($url);
$host = $urlData['host'];
** Update **
** 更新 **
The best way I can think of is to have a mapping of all the TLDs that you want to handle, since certain TLDs can be tricky (co.uk).
我能想到的最好方法是对您要处理的所有 TLD 进行映射,因为某些 TLD 可能很棘手 (co.uk)。
// you can add more to it if you want
$urlMap = array('com', 'co.uk');
$host = "";
$url = "http://www.google.co.uk";
$urlData = parse_url($url);
$hostData = explode('.', $urlData['host']);
$hostData = array_reverse($hostData);
if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
$host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
$host = $hostData[1] . '.' . $hostData[0];
}
echo $host;
回答by aequalsb
top-level domains and second-level domains may be 2 characters long but a registered subdomain must be at least 3 characters long.
顶级域和二级域的长度可以为 2 个字符,但注册的子域必须至少为 3 个字符。
EDIT: because of pjv's comment, i learned Australian domain names are an exception because they allow 5 TLDs as SLDs (com,net,org,asn,id) example: somedomain.com.au. i'm guessing com.au is nationally controlled domain name which "shares". so, technically, "com.au" would still be the "base domain", but that's not useful.
编辑:由于 pjv 的评论,我了解到澳大利亚域名是一个例外,因为它们允许 5 个 TLD 作为 SLD(com、net、org、asn、id)示例:somedomain.com.au。我猜 com.au 是国家控制的“共享”域名。所以,从技术上讲,“com.au”仍然是“基本域”,但这没有用。
EDIT: there are 47,952 possible three-letter domain names (pattern: [a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] or 36 * 37 * 36) combined with just 8 of the most common TLDS (com,org,etc) we have 383,616 possibilities -- without even adding in the entire scope of TLDs. 1-letter and 2-letter domain names still exist, but are not valid going forward.
编辑:有 47,952 个可能的三字母域名(模式:[a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] 或 36 * 37 * 36)组合只需 8 个最常见的 TLDS(com、org 等),我们就有 383,616 种可能性——甚至没有添加整个 TLD 范围。1 个字母和 2 个字母的域名仍然存在,但不再有效。
in google.com -- "google" is a subdomain of "com"
在 google.com 中——“google”是“com”的子域
in google.co.uk -- "google" is a subdomain of "co", which in turn is a subdomain of "uk", or a second-level domain really, since "co" is also a valid top-level domain
在 google.co.uk 中——“google”是“co”的子域,而“co”又是“uk”的子域,或者实际上是二级域,因为“co”也是有效的顶级域
in www.google.com -- "www" is a subdomain of "google" which is a subdomain of "com"
在 www.google.com -- "www" 是 "google" 的子域,它是 "com" 的子域
"co.uk" is NOT a valid host because there is no valid domain name
“co.uk”不是有效的主机,因为没有有效的域名
going with that assumption this function will return the proper "basedomain" in almost all cases, without requiring a "url map".
按照这个假设,这个函数几乎在所有情况下都会返回正确的“基域”,而不需要“url映射”。
if you happen to be one of the rare cases, perhaps you can modify this to fulfill particular needs...
如果您碰巧是极少数情况之一,也许您可以修改它以满足特定需求......
EDIT: you must pass the domain string as a URL with it's protocol (http://, ftp://, etc) or parse_url()will not consider it a valid URL (unless you want to modify the code to behave differently)
编辑:您必须将域字符串作为带有其协议(http://、ftp:// 等)的 URL 传递,否则parse_url()不会将其视为有效的 URL(除非您想修改代码以使其行为不同)
function basedomain( $str = '' )
{
// $str must be passed WITH protocol. ex: http://domain.com
$url = @parse_url( $str );
if ( empty( $url['host'] ) ) return;
$parts = explode( '.', $url['host'] );
$slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}
if you need to be accurate use fopenor curlto open this URL:
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
如果您需要准确使用fopen或curl打开此网址:http:
//data.iana.org/TLD/tlds-alpha-by-domain.txt
then read the lines into an array and use that to compare the domain parts
然后将这些行读入一个数组并使用它来比较域部分
EDIT: to allow for Australian domains:
编辑:允许澳大利亚域:
function au_basedomain( $str = '' )
{
// $str must be passed WITH protocol. ex: http://domain.com
$url = @parse_url( $str );
if ( empty( $url['host'] ) ) return;
$parts = explode( '.', $url['host'] );
$slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
if ( preg_match( '/\.(com|net|asn|org|id)\.au$/i', $url['host'] ) ) $slice = 3;
return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}
IMPORTANT ADDITIONAL NOTES: I don't use this function to validate domains. It is generic code I only use to extract the base domain for the server it is running on from the global $_SERVER['SERVER_NAME']for use within various internal scripts. Considering I have only ever worked on sites within the US, I have never encountered the Australian variants that pjv asked about. It is handy for internal use, but it is a long way from a complete domain validation process. If you are trying to use it in such a way, I recommend not to because of too many possibilities to match invalid domains.
重要的附加说明:我不使用此功能来验证域。这是通用代码,我仅用于从全局中提取运行它的服务器的基本域,$_SERVER['SERVER_NAME']以便在各种内部脚本中使用。考虑到我只在美国境内的网站上工作过,我从未遇到过 pjv 询问的澳大利亚变体。它便于内部使用,但距离完整的域验证过程还有很长的路要走。如果您尝试以这种方式使用它,我建议不要这样做,因为匹配无效域的可能性太多。
回答by Klaas Sangers
Try using: http://php.net/manual/en/function.parse-url.php. Something like this should work:
尝试使用:http: //php.net/manual/en/function.parse-url.php。这样的事情应该工作:
$urlParts = parse_url($yourUrl);
$hostParts = explode('.', $urlParts['host']);
$hostParts = array_reverse($hostParts);
$host = $hostParts[1] . '.' . $hostParts[0];
回答by Faizan Anwer Ali Rupani
Mixing with xil3 answer this is I got to check localhost as well as ip, so you can also work in development environment.
You still have to define what TLDs you want to use. other than that everything works fine.
与 xil3 答案混合,这是我必须检查 localhost 和 ip,因此您也可以在开发环境中工作。
您仍然必须定义要使用的 TLD。除此之外一切正常。
<?php
function getTopLevelDomain($url){
$urlData = parse_url($url);
$urlHost = isset($urlData['host']) ? $urlData['host'] : '';
$isIP = (bool)ip2long($urlHost);
if($isIP){ /** To check if it's ip then return same ip */
return $urlHost;
}
/** Add/Edit you TLDs here */
$urlMap = array('com', 'com.pk', 'co.uk');
$host = "";
$hostData = explode('.', $urlHost);
if(isset($hostData[1])){ /** To check "localhost" because it'll be without any TLDs */
$hostData = array_reverse($hostData);
if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
$host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
$host = $hostData[1] . '.' . $hostData[0];
}
return $host;
}
return ((isset($hostData[0]) && $hostData[0] != '') ? $hostData[0] : 'error no domain'); /* You can change this error in future */
}
?>
you can use it like this
你可以像这样使用它
$string = 'http://googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://googl.com';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://adad.asdasd.googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://adad.asdasd.googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://adad.asdasd.googl.com';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://adad.asdasd.googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://192.168.0.101:23';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://192.168.0.101';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'http://localhost';
echo getTopLevelDomain( $string ) . '<br>';
$string = 'https;//';
echo getTopLevelDomain( $string ) . '<br>';
$string = '';
echo getTopLevelDomain( $string ) . '<br>';
You'll get result in string like this
你会得到这样的字符串结果
googl.com.pk
googl.com.pk
googl.com
googl.com
googl.com.pk
googl.com.pk
googl.com
googl.com
192.168.0.101
192.168.0.101
localhost
error no domain
error no domain
回答by Doctor Eval
I'm not a PHP developer and I know this isn't the full solution, but I think the general problem is actually identifying all of the possible public domain names.
我不是 PHP 开发人员,我知道这不是完整的解决方案,但我认为一般问题实际上是识别所有可能的公共域名。
Luckily, there is a list of public domains maintained at https://publicsuffix.org/list/. The list is broken into two sections. The first section is public domain names which includes many of those listed in these comments, such as .comand .com.au. The public domain names are delimited with ===BEGIN ICANN DOMAINS===and ===END ICANN DOMAINS===.
幸运的是,在https://publicsuffix.org/list/ 上维护了一个公共域列表。该列表分为两部分。第一部分是公共域名,其中包括这些评论中列出的许多域名,例如.com和.com.au。公共域名以===BEGIN ICANN DOMAINS===和分隔===END ICANN DOMAINS===。
If you load just the ICANN DOMAINS list then you can identify the top-level domain names. But it would take a PHP developer to explain how to do that efficiently :)
如果您只加载 ICANN DOMAINS 列表,那么您可以识别顶级域名。但是需要 PHP 开发人员来解释如何有效地做到这一点:)
If you load the whole list then you can get information about private subdomains as well, such as those under github.io.
如果您加载整个列表,那么您还可以获得有关私有子域的信息,例如github.io.
回答by Widyo Rio
Use this function:
使用这个功能:
function getHost($url){
if (strpos($url,"http://")){
$httpurl=$url;
} else {
$httpurl="http://".$url;
}
$parse = parse_url($httpurl);
$domain=$parse['host'];
$portion=explode(".",$domain);
$count=sizeof($portion)-1;
if ($count>1){
$result=$portion[$count-1].".".$portion[$count];
} else {
$result=$domain;
}
return $result;
}
Answer all variants of example URL's.
回答示例 URL 的所有变体。

