php 为什么 cURL 返回空字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14679886/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 07:47:15  来源:igfitidea点击:

Why does cURL return an empty string?

phpcurldomdocument

提问by Nick

I'm having a problem with PHP's cURL returning an empty string with some URL's. I'm trying to parse the OG metadata of different webpages and it works with all websites I've tried except for NYTimes. Here is my code so far.

我在处理 PHP 的 cURL 时遇到了一个问题,它返回一个带有一些 URL 的空字符串。我正在尝试解析不同网页的 OG 元数据,它适用于我尝试过的所有网站,但 NYTimes 除外。到目前为止,这是我的代码。

print_r(get_og_metadata('http://somewebsite.com'));


public function get_data($url)
{
    $ch = curl_init();
    $timeout = 5;
    // the url to fetch
    curl_setopt($ch, CURLOPT_URL, $url);
    // return result as a string rather than direct output
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    // set max time of cURL execution
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

public function get_og_metadata($url)
{
    libxml_use_internal_errors(TRUE);
    $data = $this->_get_data($url);
    $doc = new DOMDocument();
    $doc->loadHTML($data);

    $xpath = new DOMXPath($doc);
    $query = '//*/meta[starts-with(@property, \'og:\')]';

    $metadatas = $xpath->query($query);
    $result = array();
    foreach($metadatas as $metadata)
    {
        $property = $metadata->getAttribute('property');
        $content = $metadata->getAttribute('content');
        $result[$property] = $content;
    }

    return $result;
}

采纳答案by ZirconCode

My guess is that a site like the New York times has protection against such behavior. Most likely this is based on the user agent, which you can fake as so:

我的猜测是像纽约时报这样的网站可以防止这种行为。这很可能是基于用户代理,你可以这样伪造:

curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');

This is the most common agent btw.

顺便说一句,这是最常见的代理。

回答by Abhishek Goel

These 5 lines did the magic for me.

这 5 行对我来说很神奇。

   curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
   curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
   curl_setopt($ch, CURLOPT_VERBOSE, 1);

回答by Michael Davidson

(That other answer is me also)

(另一个答案也是我)

This is what did it for me. It was looking for SSL verificaiton, which I happened to not need in this specific case.

这就是为我所做的。它正在寻找 SSL 验证,我碰巧在这种特定情况下不需要。

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);

回答by Michael Davidson

This is what did it for me. It was looking for SSL verificaiton, which I happened to not need in this specific case.

这就是为我所做的。它正在寻找 SSL 验证,我碰巧在这种特定情况下不需要。

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);