php 在php中获取域名（不是子域）

Question

提问by Cyclone

I have a URL which can be any of the following formats:

我有一个 URL，它可以是以下任何一种格式：

http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar

Essentially, I need to be able to match any normal URL. How can I extract example.com(or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?

本质上，我需要能够匹配任何正常的 URL。我如何example.com通过单个正则表达式从所有这些中提取（或 .net，无论 tld 是什么。我需要它来处理任何 TLD。）？

Answer 1

回答by Tyler Carter

Well you can use parse_urlto get the host:

那么你可以parse_url用来获取主机：

$info = parse_url($url);
$host = $info['host'];

Then, you can do some fancy stuff to get only the TLD and the Host

然后，您可以做一些奇特的事情来只获取 TLD 和主机

$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];

Not very elegant, but should work.

不是很优雅，但应该工作。

If you want an explanation, here it goes:

如果你想解释，这里是：

First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)

首先，我们http://通过使用parse_url的功能来获取方案（等）之间的所有内容......好吧......解析URL。:)

Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.mynamewould become:

然后我们取主机名，并根据句点所在的位置将其分成一个数组，因此test.world.hello.myname将变为：

array("test", "world", "hello", "myname");

After that, we take the number of elements in the array (4).

之后，我们取数组 (4) 中元素的数量。

Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)

然后，我们从中减去 2 以获得倒数第二个字符串（主机名，或者example，在您的示例中）

Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD

然后，我们从中减去 1 得到最后一个字符串（因为数组键从 0 开始），也称为 TLD

Then we combine those two parts with a period, and you have your base host name.

然后我们将这两个部分用句点组合起来，你就有了你的基本主机名。

Answer 2

回答by pocesar

My solution in https://gist.github.com/pocesar/5366899

我在https://gist.github.com/pocesar/5366899 中的解决方案

and the tests are here http://codepad.viper-7.com/GAh1tP

测试在这里http://codepad.viper-7.com/GAh1tP

It works with any TLD, and hideous subdomain patterns (up to 3 subdomains).

它适用于任何 TLD 和可怕的子域模式（最多 3 个子域）。

There's a test included with many domain names.

许多域名都包含一个测试。

Won't paste the function here because of the weird indentation for code in StackOverflow (could have fenced code blocks like github)

不会在这里粘贴函数，因为 StackOverflow 中代码的缩进很奇怪（可以像 github 一样有围栏代码块）

Answer 3

回答by mgutt

It is not possible to get the domain name without using a TLD list to compare with as their exist many cases with completely the same structure and length:

不使用 TLD 列表进行比较是不可能获得域名的，因为它们存在许多结构和长度完全相同的情况：

www.db.de (Subdomain) versus bbc.co.uk (Domain)
big.uk.com (SLD) versus www.uk.com (TLD)

www.db.de（子域）与 bbc.co.uk（域）
big.uk.com (SLD) 与 www.uk.com (TLD)

Mozilla's public suffix list should be the best option as it is used by all major browsers:
https://publicsuffix.org/list/public_suffix_list.dat

Mozilla 的公共后缀列表应该是最好的选择，因为它被所有主要浏览器使用：https:
//publicsuffix.org/list/public_suffix_list.dat

Feel free to use my function:

随意使用我的功能：

function tld_list($cache_dir=null) {
    // we use "/tmp" if $cache_dir is not set
    $cache_dir = isset($cache_dir) ? $cache_dir : sys_get_temp_dir();
    $lock_dir = $cache_dir . '/public_suffix_list_lock/';
    $list_dir = $cache_dir . '/public_suffix_list/';
    // refresh list all 30 days
    if (file_exists($list_dir) && @filemtime($list_dir) + 2592000 > time()) {
        return $list_dir;
    }
    // use exclusive lock to avoid race conditions
    if (!file_exists($lock_dir) && @mkdir($lock_dir)) {
        // read from source
        $list = @fopen('https://publicsuffix.org/list/public_suffix_list.dat', 'r');
        if ($list) {
            // the list is older than 30 days so delete everything first
            if (file_exists($list_dir)) {
                foreach (glob($list_dir . '*') as $filename) {
                    unlink($filename);
                }
                rmdir($list_dir);
            }
            // now set list directory with new timestamp
            mkdir($list_dir);
            // read line-by-line to avoid high memory usage
            while ($line = fgets($list)) {
                // skip comments and empty lines
                if ($line[0] == '/' || !$line) {
                    continue;
                }
                // remove wildcard
                if ($line[0] . $line[1] == '*.') {
                    $line = substr($line, 2);
                }
                // remove exclamation mark
                if ($line[0] == '!') {
                    $line = substr($line, 1);
                }
                // reverse TLD and remove linebreak
                $line = implode('.', array_reverse(explode('.', (trim($line)))));
                // we split the TLD list to reduce memory usage
                touch($list_dir . $line);
            }
            fclose($list);
        }
        @rmdir($lock_dir);
    }
    // repair locks (should never happen)
    if (file_exists($lock_dir) && mt_rand(0, 100) == 0 && @filemtime($lock_dir) + 86400 < time()) {
        @rmdir($lock_dir);
    }
    return $list_dir;
}
function get_domain($url=null) {
    // obtain location of public suffix list
    $tld_dir = tld_list();
    // no url = our own host
    $url = isset($url) ? $url : $_SERVER['SERVER_NAME'];
    // add missing scheme      ftp://            http:// ftps://   https://
    $url = !isset($url[5]) || ($url[3] != ':' && $url[4] != ':' && $url[5] != ':') ? 'http://' . $url : $url;
    // remove "/path/file.html", "/:80", etc.
    $url = parse_url($url, PHP_URL_HOST);
    // replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
    $url = trim($url, '.');
    // check if TLD exists
    $url = explode('.', $url);
    $parts = array_reverse($url);
    foreach ($parts as $key => $part) {
        $tld = implode('.', $parts);
        if (file_exists($tld_dir . $tld)) {
            return !$key ? '' : implode('.', array_slice($url, $key - 1));
        }
        // remove last part
        array_pop($parts);
    }
    return '';
}

What it makes special:

它的特别之处：

it accepts every input like URLs, hostnames or domains with- or without scheme
the list is downloaded row-by-row to avoid high memory usage
it creates a new file per TLD in a cache folder so get_domain()only needs to check through file_exists()if it exists so it does not need to include a huge database on every request like TLDExtractdoes it.
the list will be automatically updated every 30 days

它接受所有输入，如 URL、主机名或域，有或没有方案
该列表逐行下载以避免高内存使用
它在缓存文件夹中为每个 TLD 创建一个新文件，因此get_domain()只需要检查file_exists()它是否存在，因此它不需要像TLDExtract那样在每个请求中都包含一个巨大的数据库。
该列表将每 30 天自动更新一次

Test:

测试：

$urls = array(
    'http://www.example.com',// example.com
    'http://subdomain.example.com',// example.com
    'http://www.example.uk.com',// example.uk.com
    'http://www.example.co.uk',// example.co.uk
    'http://www.example.com.ac',// example.com.ac
    'http://example.com.ac',// example.com.ac
    'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
    'http://www.example.sub.ar',// sub.ar
    'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
    'http://congresodelalengua3.ar',// congresodelalengua3.ar
    'http://www.example.pvt.k12.ma.us',// example.pvt.k12.ma.us
    'http://www.example.lib.wy.us',// example.lib.wy.us
    'com',// empty
    '.com',// empty
    'http://big.uk.com',// big.uk.com
    'uk.com',// empty
    'www.uk.com',// www.uk.com
    '.uk.com',// empty
    'stackoverflow.com',// stackoverflow.com
    '.foobarfoo',// empty
    '',// empty
    false,// empty
    ' ',// empty
    1,// empty
    'a',// empty    
);

Recent version with explanations (German):
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm

最新版本的解释（德语）：http:
//www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm

Answer 4

回答by user2116044

$onlyHostName = implode('.', array_slice(explode('.', parse_url($link, PHP_URL_HOST)), -2));

Answer 5

回答by mmeyer2k

I think the best way to handle this problem is:

我认为处理这个问题的最好方法是：

$second_level_domains_regex = '/\.asn\.au$|\.com\.au$|\.net\.au$|\.id\.au$|\.org\.au$|\.edu\.au$|\.gov\.au$|\.csiro\.au$|\.act\.au$|\.nsw\.au$|\.nt\.au$|\.qld\.au$|\.sa\.au$|\.tas\.au$|\.vic\.au$|\.wa\.au$|\.co\.at$|\.or\.at$|\.priv\.at$|\.ac\.at$|\.avocat\.fr$|\.aeroport\.fr$|\.veterinaire\.fr$|\.co\.hu$|\.film\.hu$|\.lakas\.hu$|\.ingatlan\.hu$|\.sport\.hu$|\.hotel\.hu$|\.ac\.nz$|\.co\.nz$|\.geek\.nz$|\.gen\.nz$|\.kiwi\.nz$|\.maori\.nz$|\.net\.nz$|\.org\.nz$|\.school\.nz$|\.cri\.nz$|\.govt\.nz$|\.health\.nz$|\.iwi\.nz$|\.mil\.nz$|\.parliament\.nz$|\.ac\.za$|\.gov\.za$|\.law\.za$|\.mil\.za$|\.nom\.za$|\.school\.za$|\.net\.za$|\.co\.uk$|\.org\.uk$|\.me\.uk$|\.ltd\.uk$|\.plc\.uk$|\.net\.uk$|\.sch\.uk$|\.ac\.uk$|\.gov\.uk$|\.mod\.uk$|\.mil\.uk$|\.nhs\.uk$|\.police\.uk$/';
$domain = $_SERVER['HTTP_HOST'];
$domain = explode('.', $domain);
$domain = array_reverse($domain);
if (preg_match($second_level_domains_regex, $_SERVER['HTTP_HOST']) {
    $domain = "$domain[2].$domain[1].$domain[0]";
} else {
    $domain = "$domain[1].$domain[0]";
}

Answer 6

回答by happy_marmoset

I recommend using TLDExtractlibrary for all operations with domain name.

我建议将TLDExtract库用于所有带有域名的操作。

Answer 7

回答by Ehsan Chavoshi

There are two ways to extract subdomain from a host:

有两种方法可以从主机中提取子域：

The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parserand TLDExtract.

The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:

function get_domaininfo($url) {
    // regex can be replaced with parse_url
    preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
    $parts = explode(".", $matches[2]);
    $tld = array_pop($parts);
    $host = array_pop($parts);
    if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
        $tld = "$host.$tld";
        $host = array_pop($parts);
    }

    return array(
        'protocol' => $matches[1],
        'subdomain' => implode(".", $parts),
        'domain' => "$host.$tld",
        'host'=>$host,'tld'=>$tld
    );
}

Example:

print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));

Returns:

Array
(
    [protocol] => https
    [subdomain] => mysubdomain
    [domain] => domain.co.uk
    [host] => domain
    [tld] => co.uk
)

第一种更准确的方法是使用 tlds 数据库（如public_suffix_list.dat）并将域与其匹配。这在某些情况下有点重。有一些 PHP 类可以使用它，例如php-domain-parser和TLDExtract。

第二种方法不如第一种准确，但速度很快，而且在很多情况下都能给出正确答案，我为它编写了这个函数：

function get_domaininfo($url) {
    // regex can be replaced with parse_url
    preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
    $parts = explode(".", $matches[2]);
    $tld = array_pop($parts);
    $host = array_pop($parts);
    if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
        $tld = "$host.$tld";
        $host = array_pop($parts);
    }

    return array(
        'protocol' => $matches[1],
        'subdomain' => implode(".", $parts),
        'domain' => "$host.$tld",
        'host'=>$host,'tld'=>$tld
    );
}

例子：

print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));

返回：

Array
(
    [protocol] => https
    [subdomain] => mysubdomain
    [domain] => domain.co.uk
    [host] => domain
    [tld] => co.uk
)

Answer 8

回答by Greg Z

Here's a function I wrote to grab the domain without subdomain(s), regardless of whether the domain is using a ccTLD or a new style long TLD, etc... There is no lookup or huge array of known TLDs, and there's no regex. It can be a lot shorter using the ternary operator and nesting, but I expanded it for readability.

这是我编写的一个函数，用于获取没有子域的域，无论域是使用 ccTLD 还是新样式的长 TLD 等......没有查找或大量已知 TLD，也没有正则表达式. 使用三元运算符和嵌套可以缩短很多时间，但我对其进行了扩展以提高可读性。

// Per Wikipedia: "All ASCII ccTLD identifiers are two letters long, 
// and all two-letter top-level domains are ccTLDs."

function topDomainFromURL($url) {
  $url_parts = parse_url($url);
  $domain_parts = explode('.', $url_parts['host']);
  if (strlen(end($domain_parts)) == 2 ) { 
    // ccTLD here, get last three parts
    $top_domain_parts = array_slice($domain_parts, -3);
  } else {
    $top_domain_parts = array_slice($domain_parts, -2);
  }
  $top_domain = implode('.', $top_domain_parts);
  return $top_domain;
}

Answer 9

回答by Kuldip

echo getDomainOnly("http://example.com/foo/bar");

function getDomainOnly($host){
    $host = strtolower(trim($host));
    $host = ltrim(str_replace("http://","",str_replace("https://","",$host)),"www.");
    $count = substr_count($host, '.');
    if($count === 2){
        if(strlen(explode('.', $host)[1]) > 3) $host = explode('.', $host, 2)[1];
    } else if($count > 2){
        $host = getDomainOnly(explode('.', $host, 2)[1]);
    }
    $host = explode('/',$host);
    return $host[0];
}

Answer 10

回答by multimediaxp

Very late, I see that you marked regex as a keyword and my function works like a charm, so far I haven't found a url that fails:

很晚了，我看到您将 regex 标记为关键字，我的函数就像一个魅力，到目前为止我还没有找到一个失败的网址：

function get_domain_regex($url){
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }else{
    return false;
  }
}

if you want one without regex I have this one, which I am sure I also took from this post

如果你想要一个没有正则表达式的我有这个，我相信我也是从这篇文章中获取的

function get_domain($url){
  $parseUrl = parse_url($url);
  $host = $parseUrl['host'];
  $host_array = explode(".", $host);
  $domain = $host_array[count($host_array)-2] . "." . $host_array[count($host_array)-1];
  return $domain;
}

They both work amazing, BUT, this took me a while to realize if the url doesn't start with http:// or https:// it will fail so make sure the url string starts with the protocol.

他们都工作得很好，但是，这花了我一段时间才意识到如果 url 不是以 http:// 或 https:// 开头，它就会失败，所以请确保 url 字符串以协议开头。

php 在php中获取域名（不是子域）

提问by Cyclone

回答by Tyler Carter

回答by pocesar

回答by mgutt

回答by user2116044

回答by mmeyer2k

回答by happy_marmoset

回答by Ehsan Chavoshi

回答by Greg Z

回答by Kuldip

回答by multimediaxp

相关推荐

最近更新

标签

php 在php中获取域名（不是子域）

提问by Cyclone

回答by Tyler Carter

回答by pocesar

回答by mgutt

回答by user2116044

回答by mmeyer2k

回答by happy_marmoset

回答by Ehsan Chavoshi

回答by Greg Z

回答by Kuldip

回答by multimediaxp

相关推荐

php 如何使用 cURL 发出 HTTPS 请求？

如何在 PHP 中显示 MySQL Select 语句结果

PHP + Ajax 登录

Mime 类型 PDF php 上传

相关推荐

最近更新

标签