PHP 并行卷曲请求
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9308779/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP Parallel curl requests
提问by user1205408
I am doing a simple app that reads json data from 15 different URLs. I have a special need that I need to do this serverly. I am using file_get_contents($url)
.
我正在做一个简单的应用程序,它从 15 个不同的 URL 读取 json 数据。我有一个特殊的需要,我需要在服务器上执行此操作。我正在使用file_get_contents($url)
.
Since I am using file_get_contents($url). I wrote a simple script, is it:
因为我使用的是 file_get_contents($url)。我写了一个简单的脚本,是不是:
$websites = array(
$url1,
$url2,
$url3,
...
$url15
);
foreach ($websites as $website) {
$data[] = file_get_contents($website);
}
and it was proven to be very slow, because it waits for the first request and then do the next one.
事实证明它非常慢,因为它等待第一个请求,然后执行下一个请求。
回答by Sudhir Bastakoti
If you mean multi-curl then, something like this might help:
如果你的意思是多卷曲,那么这样的事情可能会有所帮助:
$nodes = array($url1, $url2, $url3);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
for($i = 0; $i < $node_count; $i++)
{
$results[] = curl_multi_getcontent ( $curl_arr[$i] );
}
print_r($results);
Hope it helps in some way
希望它以某种方式有所帮助
回答by Timo Huovinen
I would like to provide a more complete example without hitting the CPU at 100% and crashing when there's a slight error or something unexpected.
我想提供一个更完整的示例,而不会以 100% 的速度运行 CPU 并在出现轻微错误或意外情况时崩溃。
It also shows you how to fetch the headers, the body, request info and manual redirect following.
它还向您展示了如何获取标头、正文、请求信息和手动重定向。
Disclaimer, this code is intended to be extended and implemented into a library or as a quick starting point, and as such the functions inside of it are kept to a minimum.
免责声明,此代码旨在扩展并实现到库中或作为快速起点,因此其中的功能保持在最低限度。
function mtime(){
return microtime(true);
}
function ptime($prev){
$t = microtime(true) - $prev;
$t = $t * 1000;
return str_pad($t, 20, 0, STR_PAD_RIGHT);
}
function curl_multi_exec_full($mh, &$still_running) {
// In theory curl_multi_exec should never return CURLM_CALL_MULTI_PERFORM (-1) because it has been deprecated
// In practice it sometimes does
// So imagine that this just runs curl_multi_exec once and returns it's value
do {
$state = curl_multi_exec($mh, $still_running);
// curl_multi_select($mh, $timeout) simply blocks for $timeout seconds while curl_multi_exec() returns CURLM_CALL_MULTI_PERFORM
// We add it to prevent CPU 100% usage in case this thing misbehaves
} while ($still_running > 0 && $state === CURLM_CALL_MULTI_PERFORM && curl_multi_select($mh, 0.1));
return $state;
}
function curl_multi_wait($mh, $minTime = 0.001, $maxTime = 1){
$umin = $minTime*1000000;
$start_time = microtime(true);
// it sleeps until there is some activity on any of the descriptors (curl files)
// it returns the number of descriptors (curl files that can have activity)
$num_descriptors = curl_multi_select($mh, $maxTime);
// if the system returns -1, it means that the wait time is unknown, and we have to decide the minimum time to wait
// but our `$timespan` check below catches this edge case, so this `if` isn't really necessary
if($num_descriptors === -1){
usleep($umin);
}
$timespan = (microtime(true) - $start_time);
// This thing runs very fast, up to 1000 times for 2 urls, which wastes a lot of CPU
// This will reduce the runs so that each interval is separated by at least minTime
if($timespan < $umin){
usleep($umin - $timespan);
//print "sleep for ".($umin - $timeDiff).PHP_EOL;
}
}
$handles = [
[
CURLOPT_URL=>"http://example.com/",
CURLOPT_HEADER=>false,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>false,
],
[
CURLOPT_URL=>"http://www.php.net",
CURLOPT_HEADER=>false,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>false,
// this function is called by curl for each header received
// This complies with RFC822 and RFC2616, please do not suggest edits to make use of the mb_ string functions, it is incorrect!
// https://stackoverflow.com/a/41135574
CURLOPT_HEADERFUNCTION=>function($ch, $header)
{
print "header from http://www.php.net: ".$header;
//$header = explode(':', $header, 2);
//if (count($header) < 2){ // ignore invalid headers
// return $len;
//}
//$headers[strtolower(trim($header[0]))][] = trim($header[1]);
return strlen($header);
}
]
];
//create the multiple cURL handle
$mh = curl_multi_init();
$chandles = [];
foreach($handles as $opts) {
// create cURL resources
$ch = curl_init();
// set URL and other appropriate options
curl_setopt_array($ch, $opts);
// add the handle
curl_multi_add_handle($mh, $ch);
$chandles[] = $ch;
}
//execute the multi handle
$prevRunning = null;
$count = 0;
do {
$time = mtime();
// $running contains the number of currently running requests
$status = curl_multi_exec_full($mh, $running);
$count++;
print ptime($time).": curl_multi_exec status=$status running $running".PHP_EOL;
// One less is running, meaning one has finished
if($running < $prevRunning){
print ptime($time).": curl_multi_info_read".PHP_EOL;
// msg: The CURLMSG_DONE constant. Other return values are currently not available.
// result: One of the CURLE_* constants. If everything is OK, the CURLE_OK will be the result.
// handle: Resource of type curl indicates the handle which it concerns.
while ($read = curl_multi_info_read($mh, $msgs_in_queue)) {
$info = curl_getinfo($read['handle']);
if($read['result'] !== CURLE_OK){
// handle the error somehow
print "Error: ".$info['url'].PHP_EOL;
}
if($read['result'] === CURLE_OK){
/*
// This will automatically follow the redirect and still give you control over the previous page
// TODO: max redirect checks and redirect timeouts
if(isset($info['redirect_url']) && trim($info['redirect_url'])!==''){
print "running redirect: ".$info['redirect_url'].PHP_EOL;
$ch3 = curl_init();
curl_setopt($ch3, CURLOPT_URL, $info['redirect_url']);
curl_setopt($ch3, CURLOPT_HEADER, 0);
curl_setopt($ch3, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch3, CURLOPT_FOLLOWLOCATION, 0);
curl_multi_add_handle($mh,$ch3);
}
*/
print_r($info);
$body = curl_multi_getcontent($read['handle']);
print $body;
}
}
}
// Still running? keep waiting...
if ($running > 0) {
curl_multi_wait($mh);
}
$prevRunning = $running;
} while ($running > 0 && $status == CURLM_OK);
//close the handles
foreach($chandles as $ch){
curl_multi_remove_handle($mh, $ch);
}
curl_multi_close($mh);
print $count.PHP_EOL;