PHP 中的并行处理 - 你是怎么做的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6107339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parallel processing in PHP - How do you do it?
提问by enricog
I am currently trying to implement a job queue in php. The queue will then be processed as a batch job and should be able to process some jobs in parallel.
我目前正在尝试在 php 中实现一个作业队列。然后队列将作为批处理作业进行处理,并且应该能够并行处理一些作业。
I already did some research and found several ways to implement it, but I am not really aware of their advantages and disadvantages.
我已经做了一些研究并找到了几种实现它的方法,但我并没有真正意识到它们的优缺点。
E.g. doing the parallel processing by calling a script several times through fsockopen
like explained here:
Easy parallel processing in PHP
例如,通过多次调用脚本来执行并行处理,fsockopen
就像这里解释的那样:
Easy parallel processing in PHP
Another way I found was using the curl_multi
functions.
curl_multi_exec PHP docs
我发现的另一种方法是使用curl_multi
函数。
curl_multi_exec PHP 文档
But I think those 2 ways will add pretty much overhead for creating batch processing on a queue taht should mainly run on the background?
但我认为这 2 种方式会增加在队列上创建批处理的大量开销,而队列应该主要在后台运行?
I also read about pcntl_fork
wich also seems to be a way to handle the problem. But that looks like it can get really messy if you don't really know what you are doing (like me at the moment ;)).
我还阅读了关于pcntl_fork
这似乎也是处理问题的一种方法。但是,如果您真的不知道自己在做什么(就像我现在 ;)),那么看起来它会变得非常混乱。
I also had a look at Gearman
, but there I would also need to spawn the worker threads dynamically as needed and not just run a few and let the gearman job server then sent it to the free workers. Especially because the threads should be exit cleanly after one job has been executed, to not run into eventual memory leaks (code may not be perfect in that issue).
Gearman Getting Started
我还查看了Gearman
,但在那里我还需要根据需要动态生成工作线程,而不仅仅是运行一些,然后让齿轮工作服务器将其发送给空闲工作人员。特别是因为在执行一项作业后线程应该干净地退出,以免遇到最终的内存泄漏(该问题的代码可能并不完美)。
Gearman 入门
So my question is, how do you handle parallel processing in PHP? And why do you choose your method, which advantages/disadvantages may the different methods have?
所以我的问题是,你如何处理 PHP 中的并行处理?你为什么选择你的方法,不同的方法可能有哪些优点/缺点?
Thanks for any input.
感谢您提供任何意见。
采纳答案by Quamis
i use exec()
. Its easy and clean. You basically need to build a thread manager, and thread scripts, that will do what you need.
我用exec()
。它简单而干净。您基本上需要构建一个线程管理器和线程脚本,它们将满足您的需求。
I dont like fsockopen()
because it will open a server connection, that will build up and may hit the apache's connection limit
我不喜欢,fsockopen()
因为它会打开一个服务器连接,这将建立并可能达到 apache 的连接限制
I dont like curl
functions for the same reason
curl
出于同样的原因,我不喜欢函数
I dont like pnctl
because it needs the pnctl extension available, and you have to keep track of parent/child relations.
我不喜欢,pnctl
因为它需要可用的 pnctl 扩展,并且您必须跟踪父/子关系。
never played with gearman...
没玩过gearman...
回答by Mahmoud Zalt
Well I guess we have 3 options there:
好吧,我想我们有 3 个选择:
A. Multi-Thread:
A. 多线程:
PHP does not support multithread natively. But there is one PHP extension (experimental) called pthreads (https://github.com/krakjoe/pthreads) that allows you to do just that.
PHP 本身不支持多线程。但是有一个名为 pthreads ( https://github.com/krakjoe/pthreads) 的PHP 扩展(实验性)可以让您做到这一点。
B. Multi-Process:
B. 多进程:
This can be done in 3 ways:
这可以通过 3 种方式完成:
- Forking
- Executing Commands
- Piping
- 分叉
- 执行命令
- 管道
C. Distributed Parallel Processing:
C. 分布式并行处理:
How it works:
这个怎么运作:
- The
Client
App sends data (AKA message) “can be JSON formatted” to the Engine (MQ Engine) “can be local or external a web service” - The
MQ Engine
stores the data “mostly in Memory and optionally in Database” inside a queues (you can define the queue name) - The
Client
App asks the MQ Engine for a data (message) to be processed them in order (FIFO or based on priority) “you can also request data from specific queue".
- 该
Client
应用发送数据(AKA消息)“可以是JSON格式”到引擎(MQ引擎)“可以是本地或外部Web服务” - 该
MQ Engine
商店“主要是在内存和可选的数据库”一队列中的数据(你可以定义队列名称) - 该
Client
应用程序要求 MQ 引擎按顺序(FIFO 或基于优先级)处理数据(消息)“您也可以从特定队列请求数据”。
Some MQ Engines:
一些 MQ 引擎:
- ZeroMQ(good option, hard to use) a message orientated IPC Library, is a Message Queue Server in Erlang, stores jobs in memory. It is a socket library that acts as a concurrency framework. Faster than TCP for clustered products and supercomputing.
- RabbitMQ(good option, easy to use) self hosted, Enterprise Message Queues, Not really a work queue - but rather a message queue that can be used as a work queue but requires additional semantics.
- Beanstalkd(best option, easy to use) (Laravel built in support, built by facebook, for work queue) - has a "Beanstalkd console" tool which is very nice
- Gearman(problem: centralized broker system for distributed processing)
- Apache ActiveMQthe most popular open source message broker in Java, (problem: lot of bugs and problems)
- Amazon SQS(Laravel built in support, Hosted - so no administration is required. Not really a work queue thus will require extra work to handle semantics such as burying a job)
- IronMQ(Laravel built in support, Written in Go, Available both as cloud version and on-premise)
- Redis(Laravel built in support, not that fast as its not designed for that)
- Sparrow(written in Ruby that based on memcache)
- Starling(written in Ruby that based on memcache, built in twitter)
- Kestrel(just another QM)
- Kafka(Written at LinkedIn in Scala)
- EagleMQopen source, high-performance and lightweight queue manager (Written in C)
- ZeroMQ(好选择,难以使用)一个面向消息的 IPC 库,是 Erlang 中的消息队列服务器,将作业存储在内存中。它是一个充当并发框架的套接字库。对于集群产品和超级计算,比 TCP 更快。
- RabbitMQ(不错的选择,易于使用)自托管,企业消息队列,不是真正的工作队列 - 而是可以用作工作队列但需要额外语义的消息队列。
- Beanstalkd(最佳选择,易于使用)(Laravel 内置支持,由 facebook 构建,用于工作队列)- 有一个非常好的“Beanstalkd 控制台”工具
- Gearman(问题:分布式处理的集中代理系统)
- Apache ActiveMQ是 Java 中最流行的开源消息代理,(问题:很多错误和问题)
- Amazon SQS(Laravel 内置支持,托管 - 所以不需要管理。不是真正的工作队列,因此需要额外的工作来处理语义,例如埋葬工作)
- IronMQ(Laravel 内置支持,用 Go 编写,可作为云版本和内部部署)
- Redis(Laravel 内置支持,速度不如它专为此设计)
- Sparrow(使用基于内存缓存的 Ruby 编写)
- Starling(用Ruby编写,基于memcache,内置于twitter)
- 红隼(只是另一个 QM)
- Kafka(在 LinkedIn 的 Scala 中编写)
- EagleMQ开源、高性能、轻量级队列管理器(C 编写)
More of them can be foun here: http://queues.io
更多的可以在这里找到:http: //queues.io
回答by inquam
If your application is going to run under a unix/linux enviroment I would suggest you go with the forking option. It's basically childs play to get it working. I have used it for a Cron manager and had code for it to revert to a Windows friendly codepath if forking was not an option.
如果您的应用程序要在 unix/linux 环境下运行,我建议您使用分叉选项。这基本上是孩子们的游戏才能让它发挥作用。我已经将它用于 Cron 管理器,并且如果不能选择分叉,则它的代码可以恢复到 Windows 友好的代码路径。
The options of running the entire script several times do, as you state, add quite a bit of overhead. If your script is small it might not be a problem. But you will probably get used to doing parallel processing in PHP by the way you choose to go. And next time when you have a job that uses 200mb of data it might very well be a problem. So you'd be better of learning a way that you can stick with.
正如您所说,多次运行整个脚本的选项会增加相当多的开销。如果您的脚本很小,那可能不是问题。但是您可能会习惯于按照您选择的方式在 PHP 中进行并行处理。下次当你的工作使用 200mb 的数据时,它很可能会成为一个问题。所以你最好学习一种你可以坚持的方法。
I have also tested Gearman and I like it a lot. There are a few thing to think about but as a whole it offers a very good way to distribute works to different servers running different applications written in different languages. Besides setting it up, actually using it from within PHP, or any other language for that matter, is... once again... childs play.
我还测试了 Gearman,我非常喜欢它。有一些事情需要考虑,但总的来说,它提供了一种非常好的方法来将作品分发到运行以不同语言编写的不同应用程序的不同服务器。除了设置它之外,实际上在 PHP 或任何其他语言中使用它是……再一次……孩子们的游戏。
It could very well be overkill for what you need to do. But it will open your eyes to new possibilities when it comes to handling data and jobs, so I would recommend you to try Gearman for that fact alone.
对于您需要做的事情来说,这很可能是矫枉过正。但是,在处理数据和作业时,它会让您看到新的可能性,因此我建议您单独尝试 Gearman。
回答by Rakesh Sankar
I use PHP's pnctl - it is good as long as you know what you do. I understand you situation but I don't think it's something difficult to understand our code, we just have to be little more conscious than ever when implementing JOB queue or Parallel process.
我使用 PHP 的 pnctl - 只要你知道你在做什么,它就很好。我理解你的情况,但我认为理解我们的代码并不困难,我们只需要在实现 JOB 队列或并行过程时比以往任何时候都更有意识。
I feel as long as you code it perfectly and make sure the flow is perfect off-course you should keep PARALLEL PROCESS in mind when you implement.
我觉得只要你完美地编码并确保流程是完美的,你应该在实现时牢记并行过程。
Where you could do mistakes:
你可能会犯的错误:
- Loops - should be able to handle by GLOBAL vars.
- Processing some set of transactions - again as long as you define the sets proper, you should be able to get it done.
- 循环 - 应该能够由 GLOBAL vars 处理。
- 处理一些交易集 - 同样,只要您正确定义了这些集,您就应该能够完成它。
Take a look at this example - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php.
看看这个例子 - https://github.com/rakesh-sankar/Tools/blob/master/PHP/fork-parallel-process.php。
Hope it helps.
希望能帮助到你。
回答by Magic
I prefer exec() and gearman. exec() is easy and no connection and less memory consuming. gearman should need a socket connection and the worker should take some memory. But gearman is more flexible and faster than exec(). And the most important is that it can deploy the worker in other server. If the work is time and resource consuming. I'm using gearman in my current project.
我更喜欢 exec() 和 gearman。exec() 很简单,没有连接,占用的内存更少。gearman 应该需要一个套接字连接,而 worker 应该需要一些内存。但是 gearman 比 exec() 更灵活、更快。最重要的是它可以在其他服务器上部署工作线程。如果工作是耗费时间和资源的。我在当前项目中使用了 gearman。
回答by Simon East
Here's a summary of a few options for parallel processing in PHP.
这里总结了一些 PHP 并行处理选项。
AMP
放大器
Checkout Amp - Asynchronous concurrency made simple- this looks to be the most mature PHP library I've seen for parallel processing.
Checkout Amp - 异步并发变得简单- 这看起来是我见过的用于并行处理的最成熟的 PHP 库。
Peec's Process Class
Peec的过程类
This class was posted in the comments of PHP's exec() functionand provides a real simple starting point for forking new processes and keeping track of them.
这个类是在PHP 的 exec() 函数的注释中发布的,它为创建新进程和跟踪它们提供了一个真正简单的起点。
Example:
例子:
// You may use status(), start(), and stop(). notice that start() method gets called automatically one time.
$process = new Process('ls -al');
// or if you got the pid, however here only the status() metod will work.
$process = new Process();
$process.setPid(my_pid);
// Then you can start/stop/check status of the job.
$process.stop();
$process.start();
if ($process.status()) {
echo "The process is currently running";
} else {
echo "The process is not running.";
}
Other Options Compared
其他选项比较
There's also a great article Async processing or multitasking in PHPthat explains the pros and cons of various approaches:
还有一篇很棒的文章Async processing or multitasking in PHP解释了各种方法的优缺点:
- pthreads extension(see also this SitePoint article)
- Amp\Thread Library
- hack's async (requires running Facebook's HHVM)
- pcntl_fork
- popen
- fopen/curl/fsockopen
- pthreads 扩展(另请参阅此 SitePoint 文章)
- 放大器\线程库
- hack 的异步(需要运行 Facebook 的 HHVM)
- pcntl_fork
- 弹出
- fopen/curl/fsockopen
Doorman
门卫
Then, there's also this simple tutorialwhich was wrapped up into a little library called Doorman.
然后,还有这个简单的教程,它包含在一个名为Doorman的小库中。
Hope these links provide a useful starting point for more research.
希望这些链接为更多研究提供有用的起点。
回答by kevin Sue
First of all, this answer is based on the linux OS env.
Yet another pecl extension is parallel,you can install it by issuing pecl install parallel
,but it has some prerequisities:
首先,这个答案基于 linux OS env。还有一个pecl扩展是parallel的,你可以通过发布安装它pecl install parallel
,但它有一些先决条件:
- Installing ZTS(Zend Thread safety) Build PHP 7.2+ version
- if you build this extension by source, you should check the php.ini like config file,then add
extension=parallel.so
to it
- 安装 ZTS(Zend Thread Safety) Build PHP 7.2+ 版本
- 如果您通过源代码构建此扩展,则应检查 php.ini 之类的配置文件,然后添加
extension=parallel.so
到其中
then see the full example gist :https://gist.github.com/krakjoe/0ee02b887288720d9b785c9f947f3a0aor the php official site url:https://www.php.net/manual/en/book.parallel.php
然后查看完整示例要点:https: //gist.github.com/krakjoe/0ee02b887288720d9b785c9f947f3a0a或 php 官方网站网址:https: //www.php.net/manual/en/book.parallel.php
回答by symcbean
The method described in 'Easy parallel processing in PHP' is downright scary - the principle is OK - but the implementation??? As you've already pointed out the curl_multi_ fns provide a much better way of implementing this approach.
'Easy parallel processing in PHP'中描述的方法是彻头彻尾的可怕 - 原理没问题 - 但是实现???正如您已经指出的, curl_multi_ fns 提供了一种更好的方法来实现这种方法。
But I think those 2 ways will add pretty much overhead
但我认为这两种方式会增加很多开销
Yes, you probably don't need a client and server HTTP stack for handing off the job - but unless you're working for Google, your development time is much more expensive than your hardware costs - and there are plenty of tools for managing HTTP/analysing performance - and there is a defined standard covering stuff such as status notifications and authentication.
是的,您可能不需要客户端和服务器 HTTP 堆栈来交接工作——但是除非您为 Google 工作,否则您的开发时间比硬件成本要昂贵得多——并且有很多工具可以管理 HTTP /分析性能 - 并且有一个定义的标准,涵盖状态通知和身份验证等内容。
A lot of how you implement the solution depends on the level transactional integrity you require and whether you require in-order processing.
您实施解决方案的方式在很大程度上取决于您需要的事务完整性级别以及您是否需要按顺序处理。
Out of the approaches you mention I'd recommend focussing on the HTTP request method using curl_multi_ . But if you need good transactional control / in order delivery then you should definitely run a broker daemon between the source of the messages and the processing agents (there is a well written single threaded server suitable for use as a framework for the broker here). Note that the processing agents should process a single message at a time.
在您提到的方法中,我建议重点关注使用 curl_multi_ 的 HTTP 请求方法。但是,如果你需要良好的事务控制/在订单交付,那么你绝对应该运行的消息和处理剂的源之间的经纪人守护进程(有一个良好的书面单线程服务器适合作为经纪人的框架在这里)。请注意,处理代理应一次处理一条消息。
If you need a highly scalable solution, then take a look at a proper message queuing system such as RabbitMQ.
如果你需要一个高度可扩展的解决方案,那么看看一个合适的消息队列系统,比如RabbitMQ。
HTH
HTH
C.
C。