仅通过 curl 在 php 中检索标题

Question

提问by Krule

Actually I have two questions.

其实我有两个问题。

(1) Is there any reduction in processing poweror bandwidthused on remote server if I retrieve only headers as opposed to full page retrieval using php and curl?

(1)如果我只检索标题而不是使用 php 和 curl 进行整页检索，那么远程服务器上使用的处理能力或带宽是否会减少？

(2) Since I think, and I might be wrong, that answer to first questions is YES, I am trying to get last modified date or If-Modified-Since header of remote file only in order to compare it with time-date of locally stored data, so I can, in case it has been changed, store it locally. However, my script seems unable to fetch that piece of info, I get NULL, when I run this:

(2) 因为我认为，而且我可能是错的，第一个问题的答案是YES，我试图获取远程文件的最后修改日期或 If-Modified-Since 标头只是为了将其与时间日期进行比较本地存储的数据，所以我可以在它被更改的情况下将其存储在本地。但是，NULL当我运行此脚本时，我的脚本似乎无法获取该信息：

class last_change {

 public last_change;

 function set_last_change() {
  $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, "http://url/file.xml");
    curl_setopt($curl, CURLOPT_HEADER, true);
    curl_setopt($curl, CURLOPT_FILETIME, true);
    curl_setopt($curl, CURLOPT_NOBODY, true);
  // $header = curl_exec($curl);
  $this -> last_change = curl_getinfo($header);
  curl_close($curl);
 }

 function get_last_change() {
  return $this -> last_change['datetime']; // I have tested with Last-Modified & If-Modified-Since to no avail
 }

}

In case $header = curl_exec($curl)is uncomented, header data is displayed, even if I haven't requested it and is as follows:

如果$header = curl_exec($curl)未注释，即使我没有请求它，也会显示标题数据，如下所示：

HTTP/1.1 200 OK
Date: Fri, 04 Sep 2009 12:15:51 GMT
Server: Apache/2.2.8 (Linux/SUSE)
Last-Modified: Thu, 03 Sep 2009 12:46:54 GMT
ETag: "198054-118c-472abc735ab80"
Accept-Ranges: bytes
Content-Length: 4492
Content-Type: text/xml

Based on that, 'Last-Modified' is returned.

基于此，返回“Last-Modified”。

So, what am I doing wrong?

那么，我做错了什么？

Answer 1

采纳答案by GZipp

You are passing $header to curl_getinfo(). It should be $curl(the curl handle). You can get just the filetimeby passing CURLINFO_FILETIMEas the second parameter to curl_getinfo(). (Often the filetimeis unavailable, in which case it will be reported as -1).

您将 $header 传递给curl_getinfo(). 它应该是$curl（卷曲手柄）。您可以filetime通过将CURLINFO_FILETIME第二个参数传递给来获得curl_getinfo()。（通常filetime不可用，在这种情况下，它将报告为 -1）。

Your class seems to be wasteful, though, throwing away a lot of information that could be useful. Here's another way it might be done:

但是，您的课程似乎很浪费，丢弃了很多可能有用的信息。这是可以完成的另一种方法：

class URIInfo 
{
    public $info;
    public $header;
    private $url;

    public function __construct($url)
    {
        $this->url = $url;
        $this->setData();
    }

    public function setData() 
    {
        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $this->url);
        curl_setopt($curl, CURLOPT_FILETIME, true);
        curl_setopt($curl, CURLOPT_NOBODY, true);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_HEADER, true);
        $this->header = curl_exec($curl);
        $this->info = curl_getinfo($curl);
        curl_close($curl);
    }

    public function getFiletime() 
    {
        return $this->info['filetime'];
    }

    // Other functions can be added to retrieve other information.
}

$uri_info = new URIInfo('http://www.codinghorror.com/blog/');
$filetime = $uri_info->getFiletime();
if ($filetime != -1) {
    echo date('Y-m-d H:i:s', $filetime);
} else {
    echo 'filetime not available';
}

Yes, the load will be lighter on the server, since it's only returning only the HTTP header (responding, after all, to a HEADrequest). How much lighter will vary greatly.

是的，服务器上的负载会更轻，因为它只返回 HTTP 标头（毕竟，响应HEAD请求）。轻多少会有很大差异。

Answer 2

回答by patrick

Why use CURL for this? There is a PHP-function for that:

为什么要为此使用 CURL？有一个 PHP 函数：

$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg");
print_r($headers);

returns the following:

返回以下内容：

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Tue, 11 Mar 2014 22:44:38 GMT
    [2] => Server: Apache
    [3] => Last-Modified: Tue, 25 Feb 2014 14:08:40 GMT
    [4] => ETag: "54e35e8-8873-4f33ba00673f4"
    [5] => Accept-Ranges: bytes
    [6] => Content-Length: 34931
    [7] => Connection: close
    [8] => Content-Type: image/jpeg
)

Should be easy to get the content-type after this.

在此之后应该很容易获得内容类型。

You could also add the format=1 to get_headers:

您还可以将 format=1 添加到 get_headers：

$headers=get_headers("http://www.amazingjokes.com/img/2014/530c9613d29bd_CountvonCount.jpg",1);
    print_r($headers);

This will return the following:

这将返回以下内容：

Array
(
    [0] => HTTP/1.1 200 OK
    [Date] => Tue, 11 Mar 2014 22:44:38 GMT
    [Server] => Apache
    [Last-Modified] => Tue, 25 Feb 2014 14:08:40 GMT
    [ETag] => "54e35e8-8873-4f33ba00673f4"
    [Accept-Ranges] => bytes
    [Content-Length] => 34931
    [Connection] => close
    [Content-Type] => image/jpeg
)

回答by Ian Kemp

(1) Yes. A HEAD request (as you're issuing in this case) is far lighter on the server because it only returns the HTTP headers, as opposed to the headers and content like a standard GET request.

(1) 是的。HEAD 请求（正如您在本例中发出的那样）在服务器上要轻得多，因为它只返回 HTTP 标头，而不是像标准 GET 请求那样的标头和内容。

(2) You need to set the CURLOPT_RETURNTRANSFER option to truebefore you call curl_exec()to have the content returned, as opposed to printed:

(2) 您需要true在调用之前将 CURLOPT_RETURNTRANSFER 选项设置为curl_exec()返回内容，而不是打印：

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

That should also make your class work correctly.

这也应该使您的课程正常工作。

Answer 4

回答by lipinf

You can set the default stream context:

您可以设置默认流上下文：

stream_context_set_default(
    array(
        'http' => array(
            'method' => 'HEAD'
        )
    )
);

Then use:

然后使用：

$headers = get_headers($url,1);

get_headers seems to be more efficient than cURL once get_headers skip steps like trigger authentication routines such as log in prompts or cookies.

一旦 get_headers 跳过诸如登录提示或 cookie 等触发身份验证例程之类的步骤，get_headers 似乎比 cURL 更有效。

Answer 5

回答by rodrigo-silveira

Here is my implementation using CURLOPT_HEADER, then parsing the output string into a map:

这是我使用 CURLOPT_HEADER 的实现，然后将输出字符串解析为映射：

function http_headers($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HEADER, true);

    $headers = curl_exec($ch);

    curl_close($ch);

    $data = [];
    $headers = explode(PHP_EOL, $headers);
    foreach ($headers as $row) {
        $parts = explode(':', $row);
        if (count($parts) === 2) {
            $data[trim($parts[0])] = trim($parts[1]);
        }
    }

    return $data;
};

Sample usage:

示例用法：

$headers = http_headers('https://i.ytimg.com/vi_webp/g-dKXOlsf98/hqdefault.webp');
print_r($headers);

Array
(
    ['Content-Type'] => 'image/webp'
    ['ETag'] => '1453807629'
    ['X-Content-Type-Options'] => 'nosniff'
    ['Server'] => 'sffe'
    ['Content-Length'] => 32958
    ['X-XSS-Protection'] => '1; mode=block'
    ['Age'] => 11
    ['Cache-Control'] => 'public, max-age=7200'
)

Answer 6

回答by Greg

You need to add

你需要添加

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

to return the header instead of printing it.

返回标题而不是打印它。

Whether returning only the headers is lighter on the server depends on the script that's running, but usually it will be.

在服务器上是否仅返回标头更轻取决于正在运行的脚本，但通常会如此。

I think you also want "filetime" instead of "datetime".

我认为您还需要“文件时间”而不是“日期时间”。

仅通过 curl 在 php 中检索标题

提问by Krule

采纳答案by GZipp

回答by patrick

回答by Ian Kemp

回答by lipinf

回答by rodrigo-silveira

回答by Greg

相关推荐

最近更新

标签

仅通过 curl 在 php 中检索标题

提问by Krule

采纳答案by GZipp

回答by patrick

回答by Ian Kemp

回答by lipinf

回答by rodrigo-silveira

回答by Greg

相关推荐

php PHPSESSID 是什么？

php 如何从子实例调用父函数？

PHP 在关联数组前面加上文字键？

php 在 Codeigniter 中获取 url

相关推荐

最近更新

标签