使用 PHP 进行 cURL 多线程处理

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12394027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 03:28:45  来源:igfitidea点击:

cURL Multi Threading with PHP

phpcurlnginx

提问by user1647347

I'm using cURL to get some rank data for over 20,000 domain names that I've got stored in a database.

我正在使用 cURL 获取存储在数据库中的 20,000 多个域名的一些排名数据。

The code I'm using is http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading.

我使用的代码是http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading

The array $competeRequests is 20,000 request to compete.com api for website ranks.

数组 $competeRequests 是针对网站排名的compete.com api 的20,000 个请求。

This is an example request: http://apps.compete.com/sites/stackoverflow.com/trended/rank/?apikey=xxxx&start_date=201207&end_date=201208&jsonp=";

这是一个示例请求:http: //apps.compete.com/sites/stackoverflow.com/trended/rank/?apikey=xxxx&start_date=201207&end_date=201208&jsonp=";

Since there are 20,000 of these requests I want to break them up into chunks so I'm using the following code to accomplish that:

由于有 20,000 个这样的请求,我想将它们分成多个块,因此我使用以下代码来实现:

foreach(array_chunk($competeRequests, 1000) as $requests) {
    foreach($requests as $request) {
        $curl->addSession( $request, $opts );
    }

}

This works great for sending the requests in batches of 1,000 however the script takes too long to execute. I've increased the max_execution_time to over 10 minutes.

这对于以 1,000 个为一组发送请求非常有用,但是脚本执行时间太长。我已将 max_execution_time 增加到 10 分钟以上。

Is there a way to send 1,000 requests from my array then parse the results then output a status update then continue with the next 1,000 until the array is empty? As of now the screen just stays white the entire time the script is executing which can be over 10 minutes.

有没有办法从我的数组发送 1,000 个请求,然后解析结果,然后输出状态更新,然后继续下一个 1,000,直到数组为空?截至目前,屏幕在脚本执行的整个过程中一直保持白色,这可能超过 10 分钟。

采纳答案by Glenn Plas

This one always does the job for me... https://github.com/petewarden/ParallelCurl

这个总是为我做这项工作...... https://github.com/petewarden/ParallelCurl

回答by Mani

The above accepted answer is outdated, So, correct answer has to be upvoted.

上面接受的答案已过时,因此,必须对正确答案进行投票。

http://php.net/manual/en/function.curl-multi-init.php

http://php.net/manual/en/function.curl-multi-init.php

Now, PHP supports fetching multiple URLs at the same time.

现在,PHP 支持同时获取多个 URL。

回答by Joe Watkins

https://github.com/krakjoe/pthreads

https://github.com/krakjoe/pthreads

enter image description here

在此处输入图片说明

You may thread in PHP, the code depicted is just horrible thread programming, and I don't advise that is how you do it, but wanted to show you the overhead of 20,000 threads ... it's 18 seconds, on my current hardware which is a Intel G620 ( dual core ) with 8gigs of ram, on server hardware you can expect much faster results ... how you thread such a task is dependant on your resources, and the resources of the service you are requesting ...

您可能在 PHP 中线程,所描述的代码只是可怕的线程编程,我不建议您这样做,但想向您展示 20,000 个线程的开销......这是 18 秒,在我当前的硬件上是具有 8gigs 内存的 Intel G620(双核),在服务器硬件上,您可以期待更快的结果……如何处理此类任务取决于您的资源以及您请求的服务的资源……

回答by Nelson

Put this at the top of your php script:

把它放在你的 php 脚本的顶部:

set_time_limit(0);
@apache_setenv('no-gzip', 1);//comment this out if you use nginx instead of apache
@ini_set('zlib.output_compression', 0);
@ini_set('implicit_flush', 1);
for ($i = 0; $i < ob_get_level(); $i++) { ob_end_flush(); }
ob_implicit_flush(1);

that would disable all caching the web server or php may be doing, making your output be displayed on the browser while the script is running.

这将禁用 Web 服务器或 php 可能正在执行的所有缓存,使您的输出在脚本运行时显示在浏览器上。

Pay attention to comment out the apache_setenvline if you use nginx web server instead of apache.

apache_setenv如果您使用 nginx Web 服务器而不是 apache,请注意注释掉该行。

Update for nginx:

nginx 更新:

So OP is using nginx, that makes things a bit trickier as nginx doesn't let to disable gzip compresion from PHP. I also use nginx and I just found out I have it active by default, see:

所以 OP 正在使用 nginx,这让事情变得有点棘手,因为 nginx 不允许从 PHP 禁用 gzip 压缩。我也使用 nginx,我刚刚发现它默认处于活动状态,请参阅:

cat /etc/nginx/nginx.conf | grep gzip
    gzip on;
    gzip_disable "msie6";
    # gzip_vary on;
    # gzip_proxied any;
    # gzip_comp_level 6;
    # gzip_buffers 16 8k;
    # gzip_http_version 1.1;
    # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

so you need to disable gzip on nginx.conf and restart nginx:

所以你需要在 nginx.conf 上禁用 gzip 并重新启动 nginx:

/etc/init.d/nginx restart

/etc/init.d/nginx restart

or you can play with the gzip_disableor gzip_typesoptions, to conditionally disable gzip for some browsers or for some page content-types respectively.

或者您可以使用gzip_disablegzip_types选项,分别为某些浏览器或某些页面内容类型有条件地禁用 gzip。