php 从 URL 解析域

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/276516/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 22:11:00  来源:igfitidea点击:

Parsing domain from a URL

php

提问by zuk1

I need to build a function which parses the domain from a URL.

我需要构建一个从 URL 解析域的函数。

So, with

所以,与

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

or

或者

http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html

http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.com

它应该返回 google.com

with

http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html

http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html

it should return google.co.uk.

它应该返回google.co.uk

回答by Owen

Check out parse_url():

退房parse_url()

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'

parse_urldoesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.

parse_url不能很好地处理严重损坏的 url,但如果您通常期望不错的 url,那就没问题了。

回答by Alix Axel

$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));

This would return the google.comfor both http://google.com/... and http://www.google.com/...

这将返回google.com两个http://google.com/...和http://www.google.com/...

回答by philfreo

From http://us3.php.net/manual/en/function.parse-url.php#93983

来自http://us3.php.net/manual/en/function.parse-url.php#93983

for some odd reason, parse_url returns the host (ex. example.com) as the path when no scheme is provided in the input url. So I've written a quick function to get the real host:

出于某种奇怪的原因,当输入 url 中没有提供方案时,parse_url 返回主机(例如 example.com)作为路径。所以我写了一个快速函数来获取真正的主机:

function getHost($Address) { 
   $parseUrl = parse_url(trim($Address)); 
   return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 
} 

getHost("example.com"); // Gives example.com 
getHost("http://example.com"); // Gives example.com 
getHost("www.example.com"); // Gives www.example.com 
getHost("http://example.com/xyz"); // Gives example.com 

回答by Shaun

The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.

本来可以 100% 工作的代码似乎对我来说并没有削减它,我确实对示例进行了一些修补,但发现代码没有帮助并且存在问题。所以我把它改成了几个功能(为了避免一直从 Mozilla 请求列表,并删除缓存系统)。这已经针对一组 1000 个 URL 进行了测试,并且似乎有效。

function domain($url)
{
    global $subtlds;
    $slds = "";
    $url = strtolower($url);

    $host = parse_url('http://'.$url,PHP_URL_HOST);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub){
        if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
            preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
        }
    }

    return @$matches[0];
}

function get_tlds() {
    $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    $content = file($address);
    foreach ($content as $num => $line) {
        $line = trim($line);
        if($line == '') continue;
        if(@substr($line[0], 0, 2) == '/') continue;
        $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
        if($line == '') continue;  //$line = '.'.$line;
        if(@$line[0] == '.') $line = substr($line, 1);
        if(!strstr($line, '.')) continue;
        $subtlds[] = $line;
        //echo "{$num}: '{$line}'"; echo "<br>";
    }

    $subtlds = array_merge(array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
        ), $subtlds);

    $subtlds = array_unique($subtlds);

    return $subtlds;    
}

Then use it like

然后像这样使用它

$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr

I know I should have turned this into a class, but didn't have time.

我知道我应该把它变成一门课,但没有时间。

回答by nikmauro

function get_domain($url = SITE_URL)
{
    preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
    return $_domain_tld[0];
}

get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr

回答by Oleksandr Fediashov

If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.

如果你想从 string 中提取主机http://google.com/dhasjkdas/sadsdds/sdda/sdads.html,使用 parse_url() 是你可以接受的解决方案。

But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.

但是,如果您想提取域或其部分,则需要使用Public Suffix List 进行打包。是的,您可以在 parse_url() 周围使用字符串函数,但有时会产生不正确的结果。

I recomend TLDExtractfor domain parsing, here is sample code that show diff:

我建议使用TLDExtract进行域解析,这里是显示差异的示例代码:

$extract = new LayerShifter\TLDExtract\Extract();

# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return google.com

$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'

# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return 'search.google.com'

$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'

回答by fatih

I've found that @philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.

我发现@philfreo 的解决方案(从 php.net 引用)可以很好地获得良好的结果,但在某些情况下,它显示了 php 的“通知”和“严格标准”消息。这是此代码的固定版本。

function getHost($url) { 
   $parseUrl = parse_url(trim($url)); 
   if(isset($parseUrl['host']))
   {
       $host = $parseUrl['host'];
   }
   else
   {
        $path = explode('/', $parseUrl['path']);
        $host = $path[0];
   }
   return trim($host); 
} 

echo getHost("http://example.com/anything.html");           // example.com
echo getHost("http://www.example.net/directory/post.php");  // www.example.net
echo getHost("https://example.co.uk");                      // example.co.uk
echo getHost("www.example.net");                            // example.net
echo getHost("subdomain.example.net/anything");             // subdomain.example.net
echo getHost("example.net");                                // example.net

回答by Kristoffer Bohmann

Please consider replacring the accepted solution with the following:

请考虑使用以下内容替换已接受的解决方案:

parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well. Here are some examples:

parse_url() 将始终包含任何子域,因此该函数不能很好地解析域名。这里有些例子:

$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'

echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com

echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk

Instead, you may consider this pragmatic solution. It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.

相反,您可以考虑这种务实的解决方案。它将涵盖许多域名,但不是所有域名——例如,不包括诸如“sos.state.oh.us”之类的较低级别的域。

function getDomain($url) {
    $host = parse_url($url, PHP_URL_HOST);

    if(filter_var($host,FILTER_VALIDATE_IP)) {
        // IP address returned as domain
        return $host; //* or replace with null if you don't want an IP back
    }

    $domain_array = explode(".", str_replace('www.', '', $host));
    $count = count($domain_array);
    if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
        // SLD (example.co.uk)
        return implode('.', array_splice($domain_array, $count-3,3));
    } else if( $count>=2 ) {
        // TLD (example.com)
        return implode('.', array_splice($domain_array, $count-2,2));
    }
}

// Your domains
    echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
    echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
    echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk

// TLD
    echo getDomain('https://shop.example.com'); // example.com
    echo getDomain('https://foo.bar.example.com'); // example.com
    echo getDomain('https://www.example.com'); // example.com
    echo getDomain('https://example.com'); // example.com

// SLD
    echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
    echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
    echo getDomain('https://bbc.co.uk'); // bbc.co.uk

// IP
    echo getDomain('https://1.2.3.45');  // 1.2.3.45

Finally, Jeremy Kendall's PHP Domain Parserallows you to parse the domain name from a url. League URI Hostname Parserwill also do the job.

最后,Jeremy Kendall 的PHP Domain Parser允许您从 url 解析域名。League URI Hostname Parser也将完成这项工作。

回答by Michael

$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))

回答by Oleg Matei

You can pass PHP_URL_HOST into parse_url function as second parameter

您可以将 PHP_URL_HOST 作为第二个参数传递给 parse_url 函数

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'