如何使用 PHP 获取网站的 favicon?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5701593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 22:16:54  来源:igfitidea点击:

How to get a website's favicon with PHP?

phpregexfavicon

提问by Kemal

I want to get, requested website's favicon with PHP. I have been recommended using Google's favicon service but it is not functional. I want to do something on my own but don't know regex usage.

我想用 PHP 获取请求的网站图标。有人建议我使用 Google 的图标服务,但它不起作用。我想自己做点什么,但不知道正则表达式的用法。

I found a class on Google that works on most cases but it has unacceptable error rate. You can have a look here: http://www.controlstyle.com/articles/programming/text/php-favicon/

我在 Google 上找到了一个适用于大多数情况的课程,但它的错误率令人无法接受。你可以看看这里:http: //www.controlstyle.com/articles/programming/text/php-favicon/

Can somebody please help me about getting favicon using regex, please?

有人可以帮助我使用正则表达式获取 favicon 吗?

采纳答案by adilbo

PHP Grab Favicon

PHP 抓取图标

This is a comfortable way with many parameter to get the favicon from a page URL.

这是一种带有许多参数的舒适方式,可以从页面 URL 获取网站图标。

How it Works

这个怎么运作

  1. Check if the favicon already exists local or no save is wished, if so return path & filename
  2. Else load URL and try to match the favicon location with regex
  3. If we have a match the favicon link will be made absolute
  4. If we have no favicon we try to get one in domain root
  5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
  6. If favicon should be saved try to load the favicon URL
  7. If wished save the Favicon for the next time and return the path & filename
  1. 检查图标是否已存在本地或不希望保存,如果是,则返回路径和文件名
  2. 否则加载 URL 并尝试将图标位置与正则表达式匹配
  3. 如果我们有匹配项,则图标链接将成为绝对链接
  4. 如果我们没有图标,我们会尝试在域根目录中获取一个
  5. 如果仍然没有 favicon 我们随机尝试 google、faviconkit 和 faviconrabber API
  6. 如果应保存收藏夹图标,请尝试加载收藏夹图标 URL
  7. 如果希望下次保存 Favicon 并返回路径和文件名

So it combine both ways: Try to get the Favicon from the Page and if that don't work use an "API" Service that give back the Favicon ;-)

所以它结合了两种方式:尝试从页面获取 Favicon,如果不起作用,请使用返回 Favicon 的“API”服务;-)

<?php
/*

PHP Grab Favicon
================

> This `PHP Favicon Grabber` use a given url, save a copy (if wished) and return the image path.

How it Works
------------

1. Check if the favicon already exists local or no save is wished, if so return path & filename
2. Else load URL and try to match the favicon location with regex
3. If we have a match the favicon link will be made absolute
4. If we have no favicon we try to get one in domain root
5. If there is still no favicon we randomly try google, faviconkit & favicongrabber API
6. If favicon should be saved try to load the favicon URL
7. If wished save the Favicon for the next time and return the path & filename

How to Use
----------

```PHP
$url = 'example.com';

$grap_favicon = array(
'URL' => $url,   // URL of the Page we like to get the Favicon from
'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
'DIR' => './',   // Local Dir the copy of the Favicon should be saved
'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
);

echo '<img src="'.grap_favicon($grap_favicon).'">';
```

Todo
----
Optional split the download dir into several sub-dirs (MD5 segment of filename e.g. /af/cd/example.com.png) if there are a lot of favicons.

Infos about Favicon
-------------------
https://github.com/audreyr/favicon-cheat-sheet

###### Copyright 2019 Igor Gaffling

*/ 

$testURLs = array(
  'http://aws.amazon.com',
  'http://www.apple.com',
  'http://www.dribbble.com',
  'http://www.github.com',
  'http://www.intercom.com',
  'http://www.indiehackers.com',
  'http://www.medium.com',
  'http://www.mailchimp.com',
  'http://www.netflix.com',
  'http://www.producthunt.com',
  'http://www.reddit.com',
  'http://www.slack.com',
  'http://www.soundcloud.com',
  'http://www.stackoverflow.com',
  'http://www.techcrunch.com',
  'http://www.trello.com',
  'http://www.vimeo.com',
  'https://www.whatsapp.com/',
  'https://www.gaffling.com/',
);

foreach ($testURLs as $url) {
  $grap_favicon = array(
    'URL' => $url,   // URL of the Page we like to get the Favicon from
    'SAVE'=> true,   // Save Favicon copy local (true) or return only favicon url (false)
    'DIR' => './',   // Local Dir the copy of the Favicon should be saved
    'TRY' => true,   // Try to get the Favicon frome the page (true) or only use the APIs (false)
    'DEV' => null,   // Give all Debug-Messages ('debug') or only make the work (null)
  );
  $favicons[] = grap_favicon($grap_favicon);
}
foreach ($favicons as $favicon) {
  echo '<img title="'.$favicon.'" style="width:32px;padding-right:32px;" src="'.$favicon.'">';
}
echo '<br><br><tt>Runtime: '.round((microtime(true)-$_SERVER["REQUEST_TIME_FLOAT"]),2).' Sec.';

function grap_favicon( $options=array() ) {

  // Ini Vars
  $url       = (isset($options['URL']))?$options['URL']:'gaffling.com';
  $save      = (isset($options['SAVE']))?$options['SAVE']:true;
  $directory = (isset($options['DIR']))?$options['DIR']:'./';
  $trySelf   = (isset($options['TRY']))?$options['TRY']:true;
  $DEBUG     = (isset($options['DEV']))?$options['DEV']:null;

  // URL to lower case
    $url = strtolower($url);

    // Get the Domain from the URL
  $domain = parse_url($url, PHP_URL_HOST);

  // Check Domain
  $domainParts = explode('.', $domain);
  if(count($domainParts) == 3 and $domainParts[0]!='www') {
    // With Subdomain (if not www)
    $domain = $domainParts[0].'.'.
              $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
  } else if (count($domainParts) >= 2) {
    // Without Subdomain
        $domain = $domainParts[count($domainParts)-2].'.'.$domainParts[count($domainParts)-1];
    } else {
      // Without http(s)
      $domain = $url;
    }

    // FOR DEBUG ONLY
    if($DEBUG=='debug')print('<b style="color:red;">Domain</b> #'.@$domain.'#<br>');

    // Make Path & Filename
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon not already exists local
  if ( !file_exists($filePath) or @filesize($filePath)==0 ) {

    // If $trySelf == TRUE ONLY USE APIs
    if ( isset($trySelf) and $trySelf == TRUE ) {  

      // Load Page
      $html = load($url, $DEBUG);

      // Find Favicon with RegEx
      $regExPattern = '/((<link[^>]+rel=.(icon|shortcut icon|alternate icon)[^>]+>))/i';
      if ( @preg_match($regExPattern, $html, $matchTag) ) {
        $regExPattern = '/href=(\'|\")(.*?)/i';
        if ( isset($matchTag[1]) and @preg_match($regExPattern, $matchTag[1], $matchUrl)) {
          if ( isset($matchUrl[2]) ) {

            // Build Favicon Link
            $favicon = rel2abs(trim($matchUrl[2]), 'http://'.$domain.'/');

            // FOR DEBUG ONLY
            if($DEBUG=='debug')print('<b style="color:red;">Match</b> #'.@$favicon.'#<br>');

          }
        }
      }

      // If there is no Match: Try if there is a Favicon in the Root of the Domain
        if ( empty($favicon) ) { 
        $favicon = 'http://'.$domain.'/favicon.ico';

        // Try to Load Favicon
        if ( !@getimagesize($favicon) ) {
          unset($favicon);
        }
        }

    } // END If $trySelf == TRUE ONLY USE APIs

    // If nothink works: Get the Favicon from API
    if ( !isset($favicon) or empty($favicon) ) {

      // Select API by Random
      $random = rand(1,3);

      // Faviconkit API
      if ($random == 1 or empty($favicon)) {
        $favicon = 'https://api.faviconkit.com/'.$domain.'/16';
      }

      // Favicongrabber API
      if ($random == 2 or empty($favicon)) {
        $echo = json_decode(load('http://favicongrabber.com/api/grab/'.$domain,FALSE),TRUE);

        // Get Favicon URL from Array out of json data (@ if something went wrong)
        $favicon = @$echo['icons']['0']['src'];

      }

      // Google API (check also md5() later)
      if ($random == 3) {
        $favicon = 'http://www.google.com/s2/favicons?domain='.$domain;
      } 

      // FOR DEBUG ONLY
      if($DEBUG=='debug')print('<b style="color:red;">'.$random.'. API</b> #'.@$favicon.'#<br>');

    } // END If nothink works: Get the Favicon from API

    // Write Favicon local
    $filePath = preg_replace('#\/\/#', '/', $directory.'/'.$domain.'.png');

    // If Favicon should be saved
    if ( isset($save) and $save == TRUE ) {

      //  Load Favicon
      $content = load($favicon, $DEBUG);

      // If Google API don't know and deliver a default Favicon (World)
      if ( isset($random) and $random == 3 and 
           md5($content) == '3ca64f83fdcf25135d87e08af65e68c9' ) {
        $domain = 'default'; // so we don't save a default icon for every domain again

        // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Google</b> #use default icon#<br>');

      }

      // Write 
      $fh = @fopen($filePath, 'wb');
      fwrite($fh, $content);
      fclose($fh);

      // FOR DEBUG ONLY
        if($DEBUG=='debug')print('<b style="color:red;">Write-File</b> #'.@$filePath.'#<br>');

    } else {

      // Don't save Favicon local, only return Favicon URL
      $filePath = $favicon;
    }

    } // END If Favicon not already exists local

    // FOR DEBUG ONLY
    if ($DEBUG=='debug') {

    // Load the Favicon from local file
      if ( !function_exists('file_get_contents') ) {
      $fh = @fopen($filePath, 'r');
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($filePath);
    }
      print('<b style="color:red;">Image</b> <img style="width:32px;" 
             src="data:image/png;base64,'.base64_encode($content).'"><hr size="1">');
  }

  // Return Favicon Url
  return $filePath;

} // END MAIN Function

/* HELPER load use curl or file_get_contents (both with user_agent) and fopen/fread as fallback */
function load($url, $DEBUG) {
  if ( function_exists('curl_version') ) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_USERAGENT, 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $content = curl_exec($ch);
    if ( $DEBUG=='debug' ) { // FOR DEBUG ONLY
      $http_code = curl_getinfo($ch);
      print('<b style="color:red;">cURL</b> #'.$http_code['http_code'].'#<br>');
    }
    curl_close($ch);
    unset($ch);
  } else {
    $context = array ( 'http' => array (
        'user_agent' => 'FaviconBot/1.0 (+http://'.$_SERVER['SERVER_NAME'].'/)'),
    );
    $context = stream_context_create($context);
      if ( !function_exists('file_get_contents') ) {
      $fh = fopen($url, 'r', FALSE, $context);
      $content = '';
      while (!feof($fh)) {
        $content .= fread($fh, 128); // Because filesize() will not work on URLS?
      }
      fclose($fh);
    } else {
      $content = file_get_contents($url, NULL, $context);
    }
  }
  return $content;
}

/* HELPER: Change URL from relative to absolute */
function rel2abs( $rel, $base ) {
    extract( parse_url( $base ) );
    if ( strpos( $rel,"//" ) === 0 ) return $scheme . ':' . $rel;
    if ( parse_url( $rel, PHP_URL_SCHEME ) != '' ) return $rel;
    if ( $rel[0] == '#' or $rel[0] == '?' ) return $base . $rel;
    $path = preg_replace( '#/[^/]*$#', '', $path);
    if ( $rel[0] ==  '/' ) $path = '';
    $abs = $host . $path . "/" . $rel;
    $abs = preg_replace( "/(\/\.?\/)/", "/", $abs);
    $abs = preg_replace( "/\/(?!\.\.)[^\/]+\/\.\.\//", "/", $abs);
    return $scheme . '://' . $abs;
}

Source: https://github.com/gaffling/PHP-Grab-Favicon

来源:https: //github.com/gaffling/PHP-Grab-Favicon

回答by Starx

Use the S2 serviceprovided by google. It is as simple as this

使用S2 service谷歌提供的。就这么简单

http://www.google.com/s2/favicons?domain=www.yourdomain.com

Scraping this would be much easier, that trying to do it yourself.

刮这个会容易得多,尝试自己做。

回答by vooD

Quick and dirty:

又快又脏:

<?php 
$url = 'http://example.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
echo $arr[0]['href'];

回答by Marcel

It looks like http://www.getfavicon.org/?url=domain.com(FAQ) reliably scrapes a website's favicon. I realise it's a 3rd-party service but I think it's a worthy alternative to the Google favicon service.

看起来http://www.getfavicon.org/?url=domain.com常见问题解答)可靠地抓取了网站的图标。我意识到这是一个 3rd 方服务,但我认为它是 Google favicon 服务的一个有价值的替代品。

回答by blackpla9ue

I've been doing something similar and I checked this with a bunch of URL and all seemed to work. URL doesn't have to be a base URL

我一直在做类似的事情,我用一堆 URL 检查了这个,一切似乎都有效。URL 不必是基本 URL

function getFavicon($url){
    # make the URL simpler
    $elems = parse_url($url);
    $url = $elems['scheme'].'://'.$elems['host'];

    # load site
    $output = file_get_contents($url);

    # look for the shortcut icon inside the loaded page
    $regex_pattern = "/rel=\"shortcut icon\" (?:href=[\'\"]([^\'\"]+)[\'\"])?/";
    preg_match_all($regex_pattern, $output, $matches);

    if(isset($matches[1][0])){
        $favicon = $matches[1][0];

        # check if absolute url or relative path
        $favicon_elems = parse_url($favicon);

        # if relative
        if(!isset($favicon_elems['host'])){
            $favicon = $url . '/' . $favicon;
        }

        return $favicon;
    }

    return false;
}

回答by mdec

According to Wikipedia, there are 2 major methods which can be used by websites to have a favicon picked up by a browser. The first is as Steve mentioned, having the icon stored as favicon.ico in the root directory of the webserver. The second is to reference the favicon via the HTML link tag.

根据维基百科,网站可以使用两种主要方法来让浏览器选择图标。第一个是史蒂夫提到的,将图标存储为 favicon.ico 在网络服务器的根目录中。第二种是通过 HTML 链接标签引用网站图标。

To cover all of these cases, the best idea would be to test for the presence of the favicon.ico file first, and if it is not present, search for either the <link rel="icon"or <link rel="shortcut icon"part in the source (limited to the HTML head node) until you find the favicon. It is up to you whether you choose to use regex, or some other string search option(not to mention the built in PHP ones). Finally, this questionmay be of some help to you.

为了涵盖所有这些情况,最好的想法是首先测试 favicon.ico 文件是否存在,如果不存在,则搜索源中的<link rel="icon"<link rel="shortcut icon"部分(仅限于 HTML 头节点),直到你找到了收藏夹。选择使用正则表达式还是其他一些字符串搜索选项(更不用说内置的 PHP搜索选项)取决于您。最后,这个问题可能对你有帮助。

回答by Vivek

First Method in which we can search it from fevicon.ico if found than it will show it up else not

第一种方法,如果找到,我们可以从 fevicon.ico 搜索它,否则它会显示出来

<?php
        $userPath=$_POST["url"];
        $path="http://www.".$userPath."/favicon.ico";
        $header=  get_headers($path);
        if(preg_match("|200|", $header[0]))
        {
            echo '<img src="'.$path.'">';
        }
        else
        {
            echo "<span class=error>Not found</span>";
        }
    ?>

In other method you can search for icon and get that icon file

在其他方法中,您可以搜索图标并获取该图标文件

    <?php
$website=$_POST["url"];
$fevicon= getFavicon($website);
echo '<img src="http://www.'.$website.'/'.$fevicon.'">';
function getFavicon($site)
{
            $html=file_get_contents("http://www.".$site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
}
?>

回答by Jaime Bellmyer

I've implemented a favicon grabber of my own, and I detailed the usage in another StackOverflow post here: Get website's favicon with JS

我已经实现了我自己的网站图标抓取器,我在另一个 StackOverflow 帖子中详细说明了用法:Get website's favicon with JS

Thanks, and let me know if it helps you. Also, any feedback is greatly appreciated.

谢谢,如果对您有帮助,请告诉我。此外,非常感谢任何反馈。

回答by Vince

See this answer : https://stackoverflow.com/a/22771267. It's an easy to use PHP class to get the favicon URL and download it, and it also gives you some informations about the favicon like file type or how the favicon was found (default URL, <link>tag...) :

请参阅此答案:https: //stackoverflow.com/a/22771267。这是一个易于使用的 PHP 类来获取收藏夹图标 URL 并下载它,它还为您提供了一些关于收藏夹图标的信息,如文件类型或如何找到收藏夹图标(默认 URL、<link>标签...):

<?php
require 'FaviconDownloader.class.php';
$favicon = new FaviconDownloader('https://code.google.com/p/chromium/issues/detail?id=236848');

if($favicon->icoExists){
    echo "Favicon found : ".$favicon->icoUrl."\n";

    // Saving favicon to file
    $filename = 'favicon-'.time().'.'.$favicon->icoType;
    file_put_contents($filename, $favicon->icoData);
    echo "Saved to ".$filename."\n\n";
} else {
    echo "No favicon for ".$favicon->url."\n\n";
}

$favicon->debug();
/*
FaviconDownloader Object
(
    [url] => https://code.google.com/p/chromium/issues/detail?id=236848
    [pageUrl] => https://code.google.com/p/chromium/issues/detail?id=236848
    [siteUrl] => https://code.google.com/
    [icoUrl] => https://ssl.gstatic.com/codesite/ph/images/phosting.ico
    [icoType] => ico
    [findMethod] => head absolue_full
    [error] => 
    [icoExists] => 1
    [icoMd5] => a6cd47e00e3acbddd2e8a760dfe64cdc
)
*/
?>

回答by Thamaraiselvam

$url = 'http://thamaraiselvam.strikingly.com/';
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
@$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[@rel="shortcut icon"]');
if (!empty($arr[0]['href'])) {
    echo "<img src=".$arr[0]['href'].">";
 }
else 
echo "<img src='".$url."/favicon.ico'>";

回答by pc_

I changed a bit Vivek second methodand added a this functionand it looks like this:

我改变了一点Vivek 的第二种方法并添加了一个这个函数,它看起来像这样:

<?php
        $website=$_GET['u'];
        $fevicon= getFavicon($website);
        echo '<img src="'.path_to_absolute($fevicon,$website).'"></img>';

            function getFavicon($site)
            {
            $html=file_get_contents($site);
            $dom=new DOMDocument();
            @$dom->loadHTML($html);
            $links=$dom->getElementsByTagName('link');
            $fevicon='';

            for($i=0;$i < $links->length;$i++ )
            {
                $link=$links->item($i);
                if($link->getAttribute('rel')=='icon'||$link->getAttribute('rel')=="Shortcut Icon"||$link->getAttribute('rel')=="shortcut icon")
                {
                    $fevicon=$link->getAttribute('href');
                }
            }
            return  $fevicon;
            }

    // transform to absolute path function... 
    function path_to_absolute($rel, $base)
    {
    /* return if already absolute URL */
    if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;
    /* queries and anchors */
    if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;
    /* parse base URL and convert to local variables:
       $scheme, $host, $path */
    extract(parse_url($base));
    /* remove non-directory element from path */
    $path = preg_replace('#/[^/]*$#', '', $path);
    /* destroy path if relative url points to root */
    if ($rel[0] == '/') $path = '';
    /* dirty absolute URL */
    $abs = "$host$path/$rel";
    /* replace '//' or '/./' or '/foo/../' with '/' */
    $re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
    for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}
    /* absolute URL is ready! */
    return $scheme.'://'.$abs;
    }

?>

Of course you call it with https://www.domain.tld/favicon/this_script.php?u=http://www.example.comStill can't catch all options but now absolute path is resolved. Hope it helps.

当然你用https://www.domain.tld/favicon/this_script.php?u=http://www.example.comStill can't catch all options 来调用它,但现在绝对路径已经解决了。希望能帮助到你。