如何通过 PHP 检查 URL 是否存在？

Question

提问by X10nD

How do I check if a URL exists (not 404) in PHP?

如何检查 PHP 中是否存在 URL（不是 404）？

Answer 1

回答by karim79

Here:

这里：

$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
}
else {
    $exists = true;
}

From hereand right belowthe above post, there's a curlsolution:

从这里和上面的帖子正下方，有一个curl解决方案：

function url_exists($url) {
    if (!$fp = curl_init($url)) return false;
    return true;
}

Answer 2

回答by MoonLite

When figuring out if an url exists from php there are a few things to pay attention to:

在确定 php 中是否存在 url 时，需要注意以下几点：

Is the url itself valid (a string, not empty, good syntax), this is quick to check server side.
Waiting for a response might take time and block code execution.
Not all headers returned by get_headers() are well formed.
Use curl (if you can).
Prevent fetching the entire body/content, but only request the headers.
Consider redirecting urls:
- Do you want the first code returned?
- Or follow all redirects and return the last code?
- You might end up with a 200, but it could redirect using meta tags or javascript. Figuring out what happens after is tough.

url 本身是否有效（字符串，非空，语法良好），这可以快速检查服务器端。
等待响应可能需要时间并阻止代码执行。
并非 get_headers() 返回的所有标头都是格式良好的。
使用 curl（如果可以）。
防止获取整个正文/内容，而只请求标题。
考虑重定向网址：
- 你想返回第一个代码吗？
- 还是按照所有重定向并返回最后一个代码？
- 您最终可能会得到 200，但它可以使用元标记或 javascript 进行重定向。弄清楚之后会发生什么是很困难的。

Keep in mind that whatever method you use, it takes time to wait for a response.
All code might (and probably will) halt untill you either know the result or the requests have timed out.

请记住，无论您使用什么方法，等待响应都需要时间。
所有代码都可能（并且可能会）停止，直到您知道结果或请求超时。

For example: the code below could take a LONG time to display the page if the urls are invalid or unreachable:

例如：如果网址无效或无法访问，下面的代码可能需要很长时间才能显示页面：

<?php
$urls = getUrls(); // some function getting say 10 or more external links

foreach($urls as $k=>$url){
  // this could potentially take 0-30 seconds each
  // (more or less depending on connection, target site, timeout settings...)
  if( ! isValidUrl($url) ){
    unset($urls[$k]);
  }
}

echo "yay all done! now show my site";
foreach($urls as $url){
  echo "<a href=\"{$url}\">{$url}</a><br/>";
}

The functions below could be helpfull, you probably want to modify them to suit your needs:

以下功能可能会有所帮助，您可能希望修改它们以满足您的需求：

    function isValidUrl($url){
        // first do some quick sanity checks:
        if(!$url || !is_string($url)){
            return false;
        }
        // quick check url is roughly a valid http request: ( http://blah/... ) 
        if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
            return false;
        }
        // the next bit could be slow:
        if(getHttpResponseCode_using_curl($url) != 200){
//      if(getHttpResponseCode_using_getheaders($url) != 200){  // use this one if you cant use curl
            return false;
        }
        // all good!
        return true;
    }

    function getHttpResponseCode_using_curl($url, $followredirects = true){
        // returns int responsecode, or false (if url does not exist or connection timeout occurs)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $ch = @curl_init($url);
        if($ch === false){
            return false;
        }
        @curl_setopt($ch, CURLOPT_HEADER         ,true);    // we want headers
        @curl_setopt($ch, CURLOPT_NOBODY         ,true);    // dont need body
        @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);    // catch output (do NOT print!)
        if($followredirects){
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
            @curl_setopt($ch, CURLOPT_MAXREDIRS      ,10);  // fairly random number, but could prevent unwanted endless redirects with followlocation=true
        }else{
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
        }
//      @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_TIMEOUT        ,6);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_USERAGENT      ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1");   // pretend we're a regular browser
        @curl_exec($ch);
        if(@curl_errno($ch)){   // should be 0
            @curl_close($ch);
            return false;
        }
        $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
        @curl_close($ch);
        return $code;
    }

    function getHttpResponseCode_using_getheaders($url, $followredirects = true){
        // returns string responsecode, or false if no responsecode found in headers (or url does not exist)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $headers = @get_headers($url);
        if($headers && is_array($headers)){
            if($followredirects){
                // we want the the last errorcode, reverse array so we start at the end:
                $headers = array_reverse($headers);
            }
            foreach($headers as $hline){
                // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
                // note that the exact syntax/version/output differs, so there is some string magic involved here
                if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
                    $code = $matches[1];
                    return $code;
                }
            }
            // no HTTP/xxx found in headers:
            return false;
        }
        // no headers :
        return false;
    }

Answer 3

回答by lunarnet76

$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;

so anytime you contact a website and get something else than 200 ok it will work

所以只要你联系一个网站并得到 200 以外的其他东西，它就会起作用

Answer 4

回答by Minhaz

you cannot use curl in certain servers u can use this code

您不能在某些服务器中使用 curl 您可以使用此代码

<?php
$url = 'http://www.example.com';
$array = get_headers($url);
$string = $array[0];
if(strpos($string,"200"))
  {
    echo 'url exists';
  }
  else
  {
    echo 'url does not exist';
  }
?>

Answer 5

回答by Randy Skretka

$url = 'http://google.com';
$not_url = 'stp://google.com';

if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;

// Found 'http://google.com'!Can't find 'stp://google.com'.

Answer 6

回答by leela

function URLIsValid($URL)
{
    $exists = true;
    $file_headers = @get_headers($URL);
    $InvalidHeaders = array('404', '403', '500');
    foreach($InvalidHeaders as $HeaderVal)
    {
            if(strstr($file_headers[0], $HeaderVal))
            {
                    $exists = false;
                    break;
            }
    }
    return $exists;
}

Answer 7

回答by Ehsan

I use this function:

我使用这个功能：

/**
 * @param $url
 * @param array $options
 * @return string
 * @throws Exception
 */
function checkURL($url, array $options = array()) {
    if (empty($url)) {
        throw new Exception('URL is empty');
    }

    // list of HTTP status codes
    $httpStatusCodes = array(
        100 => 'Continue',
        101 => 'Switching Protocols',
        102 => 'Processing',
        200 => 'OK',
        201 => 'Created',
        202 => 'Accepted',
        203 => 'Non-Authoritative Information',
        204 => 'No Content',
        205 => 'Reset Content',
        206 => 'Partial Content',
        207 => 'Multi-Status',
        208 => 'Already Reported',
        226 => 'IM Used',
        300 => 'Multiple Choices',
        301 => 'Moved Permanently',
        302 => 'Found',
        303 => 'See Other',
        304 => 'Not Modified',
        305 => 'Use Proxy',
        306 => 'Switch Proxy',
        307 => 'Temporary Redirect',
        308 => 'Permanent Redirect',
        400 => 'Bad Request',
        401 => 'Unauthorized',
        402 => 'Payment Required',
        403 => 'Forbidden',
        404 => 'Not Found',
        405 => 'Method Not Allowed',
        406 => 'Not Acceptable',
        407 => 'Proxy Authentication Required',
        408 => 'Request Timeout',
        409 => 'Conflict',
        410 => 'Gone',
        411 => 'Length Required',
        412 => 'Precondition Failed',
        413 => 'Payload Too Large',
        414 => 'Request-URI Too Long',
        415 => 'Unsupported Media Type',
        416 => 'Requested Range Not Satisfiable',
        417 => 'Expectation Failed',
        418 => 'I\'m a teapot',
        422 => 'Unprocessable Entity',
        423 => 'Locked',
        424 => 'Failed Dependency',
        425 => 'Unordered Collection',
        426 => 'Upgrade Required',
        428 => 'Precondition Required',
        429 => 'Too Many Requests',
        431 => 'Request Header Fields Too Large',
        449 => 'Retry With',
        450 => 'Blocked by Windows Parental Controls',
        500 => 'Internal Server Error',
        501 => 'Not Implemented',
        502 => 'Bad Gateway',
        503 => 'Service Unavailable',
        504 => 'Gateway Timeout',
        505 => 'HTTP Version Not Supported',
        506 => 'Variant Also Negotiates',
        507 => 'Insufficient Storage',
        508 => 'Loop Detected',
        509 => 'Bandwidth Limit Exceeded',
        510 => 'Not Extended',
        511 => 'Network Authentication Required',
        599 => 'Network Connect Timeout Error'
    );

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

    if (isset($options['timeout'])) {
        $timeout = (int) $options['timeout'];
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
    }

    curl_exec($ch);
    $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
        return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
    } else {
        return "'{$url}' does not exist";
    }
}

Answer 8

回答by Jonathan Parent Lévesque

karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.

karim79 的 get_headers() 解决方案对我不起作用，因为我在 Pinterest 上得到了疯狂的结果。

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

Anyway, this developer demonstrates that cURL is way faster than get_headers():

无论如何，该开发人员证明 cURL 比 get_headers() 快得多：

http://php.net/manual/fr/function.get-headers.php#104723

Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.

由于许多人要求 karim79 修复的是 cURL 解决方案，因此这是我今天构建的解决方案。

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

Let me explains the curl options:

让我解释一下 curl 选项：

CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

CURLOPT_RETURNTRANSFER：返回一个字符串而不是在屏幕上显示调用页面。

CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate

CURLOPT_SSL_VERIFYPEER: cUrl 不会检出证书

CURLOPT_HEADER: include the header in the string

CURLOPT_HEADER: 在字符串中包含标题

CURLOPT_NOBODY: don't include the body in the string

CURLOPT_NOBODY: 不要在字符串中包含正文

CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)

CURLOPT_USERAGENT：某些站点需要它才能正常运行（例如：https: //plus.google.com）

Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:

附加说明：在此函数中，我使用 Diego Perini 的正则表达式在发送请求之前验证 URL：

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

附加说明 2：我分解标头字符串和用户标头 [0] 以确保仅验证返回代码和消息（例如：200、404、405 等）

Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.

附加说明 3：有时仅验证代码 404 是不够的（请参阅单元测试），因此有一个可选的 $failCodeList 参数来提供要拒绝的所有代码列表。

And, of course, here's the unit test (including all the popular social network) to legitimates my coding:

而且，当然，这是使我的编码合法化的单元测试（包括所有流行的社交网络）：

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

Great success to all,

所有人都取得了巨大的成功，

Jonathan Parent-Lévesque from Montreal

来自蒙特利尔的 Jonathan Parent-Lévesque

Answer 9

回答by Spir

function urlIsOk($url)
{
    $headers = @get_headers($url);
    $httpStatus = intval(substr($headers[0], 9, 3));
    if ($httpStatus<400)
    {
        return true;
    }
    return false;
}

Answer 10

回答by Sebastian Lasse

pretty fast:

相当快：

function http_response($url){
    $resURL = curl_init(); 
    curl_setopt($resURL, CURLOPT_URL, $url); 
    curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); 
    curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); 
    curl_setopt($resURL, CURLOPT_FAILONERROR, 1); 
    curl_exec ($resURL); 
    $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); 
    curl_close ($resURL); 
    if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { return 0; } else return 1;
}

echo 'google:';
echo http_response('http://www.google.com');
echo '/ ogogle:';
echo http_response('http://www.ogogle.com');

如何通过 PHP 检查 URL 是否存在？

提问by X10nD

回答by karim79

回答by MoonLite

回答by lunarnet76

回答by Minhaz

回答by Randy Skretka

回答by leela

回答by Ehsan

回答by Jonathan Parent Lévesque

回答by Spir

回答by Sebastian Lasse

相关推荐

最近更新

标签

如何通过 PHP 检查 URL 是否存在？

提问by X10nD

回答by karim79

回答by MoonLite

回答by lunarnet76

回答by Minhaz

回答by Randy Skretka

回答by leela

回答by Ehsan

回答by Jonathan Parent Lévesque

回答by Spir

回答by Sebastian Lasse

相关推荐

php 缩放 HTML 表格以适应 TCPDF 中的 PDF 页面

使用 CKEditor 中的内联编辑将数据保存到 PHP/Mysql

您如何在 PHP/MySQL 应用程序中充分利用多核 CPU？

在 foreach 循环中声明的 PHP 变量是否在每次迭代时被销毁和重新创建？

相关推荐

最近更新

标签