php 如何使用 cURL 部分下载远程文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2032924/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:44:04  来源:igfitidea点击:

How to partially download a remote file with cURL?

phpcurl

提问by Ken

Is it possible to partially download a remote file with cURL? Let's say, the actual filesize of the remote file is 1000 KB. How can I download only first 500 KB of it?

是否可以使用 cURL 部分下载远程文件?假设远程文件的实际文件大小为 1000 KB。我怎样才能只下载它的前 500 KB?

回答by VolkerK

You can also set the range header parameter with the php-curl extension.

您还可以使用 php-curl 扩展名设置范围标头参数。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.spiegel.de/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
echo $result;

But as noted before if the server doesn't honor this header but sends the whole file curl will download all of it. E.g. http://www.php.netignores the header. But you can (in addition) set a write function callback and abort the request when more data is received, e.g.

但如前所述,如果服务器不接受此标头但发送整个文件 curl 将下载所有文件。例如http://www.php.net忽略标题。但是您可以(另外)设置一个写函数回调并在接收到更多数据时中止请求,例如

// php 5.3+ only
// use function writefn($ch, $chunk) { ... } for earlier versions
$writefn = function($ch, $chunk) { 
  static $data='';
  static $limit = 500; // 500 bytes, it's only a test

  $len = strlen($data) + strlen($chunk);
  if ($len >= $limit ) {
    $data .= substr($chunk, 0, $limit-strlen($data));
    echo strlen($data) , ' ', $data;
    return -1;
  }

  $data .= $chunk;
  return strlen($chunk);
};

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.php.net/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$result = curl_exec($ch);
curl_close($ch);

回答by SpliFF

Get the first 100 bytes of a document:

获取文档的前 100 个字节:

curl -r 0-99 http://www.get.this

from the manual

从手册

make sure you have a modern curl

确保你有一个现代的卷发

回答by upteryx

Thanks for the nice solution VolkerK. However I needed to use this code as a function, so here's what I came up with. I hope it's useful for others. The main difference is use ($limit, &$datadump)so a limit can be passed, and using the by-reference variable $datadump to be able to return it as a result. I also added CURLOPT_USERAGENT because some websites won't allow access without a user-agent header.

感谢 VolkerK 的好解决方案。但是我需要将此代码用作函数,所以这就是我想出的。我希望它对其他人有用。主要区别在于使用 ($limit, &$datadump)因此可以传递限制,并使用按引用变量 $datadump 能够将其作为结果返回。我还添加了 CURLOPT_USERAGENT 因为有些网站不允许在没有用户代理标头的情况下访问。

Check http://php.net/manual/en/functions.anonymous.php

检查http://php.net/manual/en/functions.anonymous.php

function curl_get_contents_partial($url, $limit) {
  $writefn = function($ch, $chunk) use ($limit, &$datadump) { 
    static $data = '';

    $len = strlen($data) + strlen($chunk);
    if ($len >= $limit) {
      $data .= substr($chunk, 0, $limit - strlen($data));
      $datadump = $data;
      return -1;
    }
    $data .= $chunk;
    return strlen($chunk);
  };

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
  //curl_setopt($ch, CURLOPT_RANGE, '0-1000'); //not honored by many sites, maybe just remove it altogether.
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
  curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
  $data = curl_exec($ch);
  curl_close($ch);
  return $datadump;
}

usage:
$page = curl_get_contents_partial('http://some.webpage.com', 1000); //read the first 1000 bytes
echo $page // or do whatever with the result.

用法:
$page = curl_get_contents_partial(' http://some.webpage.com', 1000); //读取前 1000 个字节
echo $page // 或对结果执行任何操作。

回答by amir beygi

This could be your solution (download first 500KBinto output.txt)

这可能是您的解决方案(将前 500KB下载到output.txt 中

curl -r 0-511999 http://www.yourwebsite.com > output.txt
  • while 511999is 500^1024-1
  • 511999IS500^1024-1