在 Node.js 中并行化任务
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19120213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parallelizing tasks in Node.js
提问by Jeroen De Dauw
I have some tasks I want to do in JS that are resource intensive. For this question, lets assume they are some heavy calculations, rather then system access. Now I want to run tasks A, B and C at the same time, and executing some function D when this is done.
我有一些我想在 JS 中完成的资源密集型任务。对于这个问题,让我们假设它们是一些繁重的计算,而不是系统访问。现在我想同时运行任务 A、B 和 C,并在完成后执行一些函数 D。
The async libraryprovides a nice scaffolding for this:
该异步库为此提供了一个很好的脚手架:
async.parallel([A, B, C], D);
If what I am doing is just calculations, then this will still run synchronously (unless the library is putting the tasks on different threads itself, which I expect is not the case). How do I make this be actually parallel? What is the thing done typically by async code to not block the caller (when working with NodeJS)? Is it starting a child process?
如果我正在做的只是计算,那么这仍然会同步运行(除非库将任务放在不同的线程上,我希望不是这种情况)。我如何使这实际上是平行的?异步代码通常会做什么来不阻塞调用者(使用 NodeJS 时)?它是否启动了一个子进程?
回答by Matt Self
How do I make this be actually parallel?
我如何使这实际上是平行的?
First, you won't really be running in parallel while in a single node application. A node application runs on a single thread and only one event at a time is processed by node's event loop. Even when running on a multi-core box you won't get parallelism of processing within a node application.
首先,在单节点应用程序中您不会真正并行运行。节点应用程序运行在单个线程上,节点的事件循环一次只处理一个事件。即使在多核机器上运行时,您也不会在节点应用程序中获得并行处理。
That said, you can get processing parallelism on multicore machine via forking the code into separate node processesor by spawning child process. This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (e.g. stdout, process fork IPC mechanism). Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC.
也就是说,您可以通过将代码分叉到单独的节点进程或生成子进程来在多核机器上获得处理并行性。这实际上允许您创建节点本身的多个实例并以不同方式(例如标准输出、进程分支 IPC 机制)与这些进程通信。此外,您可以选择将功能(按职责)分离到它们自己的节点应用程序/服务器中,并通过 RPC 调用它。
What is the thing done typically by asynccode to not block the caller (when working with NodeJS)?Is it starting a child process?
异步代码通常会做什么来不阻塞调用者(使用 NodeJS 时)?它是否正在启动子进程?
It is not starting a new process. Underneath, when async.parallel is used in node.js, it is using process.nextTick(). And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc.
它不是开始一个新的过程。下面,当 async.parallel 在 node.js 中使用时,它使用process.nextTick(). nextTick() 允许您通过将工作推迟到新堆栈上来避免阻塞调用者,这样您就可以交错执行 CPU 密集型任务等。
Long story short
长话短说
Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Multiple threads cannot share data/memory, therefore locks aren't needed. Node is lock free. One node process leverages one thread, and this makes node both safe and powerful.
Node 并不能轻松地“开箱即用”地实现多处理器并发。相反,Node 为您提供了一个非阻塞设计和一个事件循环,该循环利用一个线程而不共享内存。多个线程不能共享数据/内存,因此不需要锁。节点是无锁的。一个节点进程利用一个线程,这使得节点既安全又强大。
When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers.e.g. IPC/RPC.
当您需要在多个进程之间拆分工作时,请使用某种消息传递与其他进程/服务器进行通信。例如IPC/RPC。
For more see:
更多请看:
Awesome answer from SO on What is Node.js... with tons of goodness.
来自 SO 关于什么是 Node.js 的很棒的回答......有很多好处。
回答by Joshua Allen
Asynchronous and parallel are not the same thing. Asynchronous means that you don't have to wait for synchronization. Parallel means that you can be doing multiple things at the same time. Node.js is only asynchronous, but its only ever 1 thread. It can only work on 1 thing at once. If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results.
异步和并行不是一回事。异步意味着您不必等待同步。并行意味着您可以同时做多件事。Node.js 只是异步的,但它只有 1 个线程。它一次只能处理一件事。如果你有一个长时间运行的计算,你应该启动另一个进程,然后让你的 node.js 进程异步等待结果。
To do this you could use child_process.spawn and then read data from stdin.
为此,您可以使用 child_process.spawn,然后从 stdin 读取数据。
http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options
var spawn = require('child_process').spawn;
var process2 = spawn('sh', ['./computationProgram', 'parameter'] );
process2.stderr.on('data', function (data) {
//handle error input
});
process2.stdout.on('data', function (data) {
//handle data results
});
回答by wprl
Keep in mind I/O is parallelized by Node.js; only your JavaScript callbacks are single threaded.
请记住,I/O 是由 Node.js 并行化的;只有您的 JavaScript 回调是单线程的。
Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. Coordinate incoming requests using a reverse proxy or load balancer.
假设您正在编写一个服务器,添加生成过程或分叉的复杂性的另一种方法是简单地构建无状态节点服务器并在每个核心上运行一个实例,或者更好地在它们自己的虚拟化微服务器中运行多个实例。使用反向代理或负载平衡器协调传入请求。
You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop.
您还可以将计算卸载到另一台服务器,可能是 MongoDB(使用 MapReduce)或 Hadoop。
To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. The speed up from C++ might negate the need of parallelization anyway.
要真正成为硬核,您可以用 C++ 编写一个 Node 插件,并对并行化计算代码进行细粒度控制。无论如何,C++ 的加速可能会否定并行化的需要。
You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and e.g. expose them through a REST API.
您始终可以使用最适合数值计算的另一种语言编写代码来执行计算密集型任务,例如通过 REST API 公开它们。
Finally, you could perhaps run the code on the GPU using node-cudaor something similar depending on the type of computation (not all can be optimized for GPU).
最后,您也许可以node-cuda根据计算类型使用或类似的东西在 GPU 上运行代码(并非所有都可以针对 GPU 进行优化)。
Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether.
是的,你可以 fork 并产生其他进程,但在我看来,node 的主要优点之一是不必担心并行化和线程化,因此完全绕过了大量的复杂性。
回答by Chad Scira
Depending on your use case you can use something like
根据您的用例,您可以使用类似
task.jsSimplified interface for getting CPU intensive code to run on all cores (node.js, and web)
task.js用于让 CPU 密集型代码在所有内核(node.js 和 web)上运行的简化接口
A example would be
一个例子是
function blocking (exampleArgument) {
// block thread
}
// turn blocking pure function into a worker task
const blockingAsync = task.wrap(blocking);
// run task on a autoscaling worker pool
blockingAsync('exampleArgumentValue').then(result => {
// do something with result
});
回答by Joel
Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features. http://adambom.github.io/parallel.js/
最近刚遇到parallel.js,但它似乎实际上使用了多核,并且还具有map reduce类型的特性。 http://adambom.github.io/parallel.js/

