Javascript Node.js 和 CPU 密集型请求
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/3491811/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Node.js and CPU intensive requests
提问by Olivier Lalonde
I've started tinkering with Node.js HTTP server and really like to write server side Javascript but something is keeping me from starting to use Node.js for my web application.
我已经开始修改 Node.js HTTP 服务器,并且真的很喜欢编写服务器端 Javascript,但是有些东西让我无法开始在我的 Web 应用程序中使用 Node.js。
I understand the whole async I/O concept but I'm somewhat concerned about the edge cases where procedural code is very CPU intensive such as image manipulation or sorting large data sets.
我了解整个异步 I/O 概念,但我有点担心过程代码占用大量 CPU 的边缘情况,例如图像处理或对大型数据集进行排序。
As I understand it, the server will be very fast for simple web page requests such as viewing a listing of users or viewing a blog post. However, if I want to write very CPU intensive code (in the admin back end for example) that generates graphics or resizes thousands of images, the request will be very slow (a few seconds). Since this code is not async, every requests coming to the server during those few seconds will be blocked until my slow request is done.
据我了解,对于简单的网页请求,例如查看用户列表或查看博客文章,服务器将非常快。但是,如果我想编写生成图形或调整数千张图像大小的 CPU 密集型代码(例如在管理后端),请求将非常慢(几秒钟)。由于此代码不是异步的,因此在这几秒钟内到达服务器的每个请求都将被阻止,直到我的慢速请求完成。
One suggestion was to use Web Workers for CPU intensive tasks. However, I'm afraid web workers will make it hard to write clean code since it works by including a separate JS file. What if the CPU intensive code is located in an object's method? It kind of sucks to write a JS file for every method that is CPU intensive.
一种建议是将 Web Workers 用于 CPU 密集型任务。然而,我担心网络工作者会很难编写干净的代码,因为它通过包含一个单独的 JS 文件来工作。如果 CPU 密集型代码位于对象的方法中怎么办?为每个 CPU 密集型的方法编写一个 JS 文件有点糟糕。
Another suggestion was to spawn a child process, but that makes the code even less maintainable.
另一个建议是产生一个子进程,但这使得代码更难以维护。
Any suggestions to overcome this (perceived) obstacle? How do you write clean object oriented code with Node.js while making sure CPU heavy tasks are executed async?
有什么建议可以克服这个(感知到的)障碍?您如何使用 Node.js 编写干净的面向对象的代码,同时确保异步执行 CPU 繁重的任务?
采纳答案by Tim
What you need is a task queue! Moving your long running tasks out of the web-server is a GOOD thing. Keeping each task in "separate" js file promotes modularity and code reuse. It forces you to think about how to structure your program in a way that will make it easier to debug and maintain in the long run. Another benefit of a task queue is the workers can be written in a different language. Just pop a task, do the work, and write the response back.
你需要的是一个任务队列!将长时间运行的任务移出 Web 服务器是一件好事。将每个任务保存在“单独”的 js 文件中可以促进模块化和代码重用。它迫使您考虑如何以一种更易于调试和长期维护的方式来构建您的程序。任务队列的另一个好处是可以用不同的语言编写工作程序。只需弹出一个任务,完成工作,然后写回响应。
something like this https://github.com/resque/resque
像这样的https://github.com/resque/resque
Here is an article from github about why they built it http://github.com/blog/542-introducing-resque
这是来自 github 的一篇关于他们为什么构建它的文章http://github.com/blog/542-introducing-resque
回答by mbq
This is misunderstanding of the definition of web server -- it should only be used to "talk" with clients. Heavy load tasks should be delegated to standalone programs (that of course can be also written in JS).
You'd probably say that it is dirty, but I assure you that a web server process stuck in resizing images is just worse (even for lets say Apache, when it does not block other queries). Still, you may use a common library to avoid code redundancy.    
这是对 Web 服务器定义的误解——它应该只用于与客户端“交谈”。重负载的任务应该委托给独立的程序(当然也可以用 JS 编写)。
您可能会说它很脏,但我向您保证,卡在调整图像大小的 Web 服务器进程更糟(即使对于 Apache,当它不阻止其他查询时)。不过,您可以使用公共库来避免代码冗余。    
EDIT: I have come up with an analogy; web application should be as a restaurant. You have waiters (web server) and cooks (workers). Waiters are in contact with clients and do simple tasks like providing menu or explaining if some dish is vegetarian. On the other hand they delegate harder tasks to the kitchen. Because waiters are doing only simple things they respond quick, and cooks can concentrate on their job.
编辑:我想出了一个类比;Web 应用程序应该是一家餐厅。你有服务员(网络服务器)和厨师(工人)。服务员与客户保持联系并执行简单的任务,例如提供菜单或解释某道菜是否为素食。另一方面,他们将更艰巨的任务委派给厨房。因为服务员只做简单的事情,他们反应迅速,厨师可以专注于他们的工作。
Node.js here would be a single but very talented waiter that can process many requests at a time, and Apache would be a gang of dumb waiters that just process one request each. If this one Node.js waiter would begin to cook, it would be an immediate catastrophe. Still, cooking could also exhaust even a large supply of Apache waiters, not mentioning the chaos in the kitchen and the progressive decrease of responsitivity.
这里的 Node.js 将是一个单一但非常有才华的服务员,可以一次处理多个请求,而 Apache 将是一群愚蠢的服务员,每个服务员只处理一个请求。如果这个 Node.js 服务员开始做饭,那将是一场直接的灾难。尽管如此,做饭也可能耗尽大量的 Apache 服务员,更不用说厨房里的混乱和响应度的逐渐下降。
回答by masonk
You don't want your CPU intensive code to execute async, you want it to execute in parallel. You need to get the processing work out of the thread that's serving HTTP requests. It's the only way to solve this problem. With NodeJS the answer is the cluster module, for spawning child processes to do the heavy lifting. (AFAIK Node doesn't have any concept of threads/shared memory; it's processes or nothing). You have two options for how you structure your application. You can get the 80/20 solution by spawning 8 HTTP servers and handling compute-intensive tasks synchronously on the child processes. Doing that is fairly simple. You could take an hour to read about it at that link. In fact, if you just rip off the example code at the top of that link you will get yourself 95% of the way there.
您不希望 CPU 密集型代码异步执行,而是希望它并行执行。您需要从为 HTTP 请求提供服务的线程中获取处理工作。这是解决这个问题的唯一方法。使用 NodeJS,答案是集群模块,用于生成子进程来完成繁重的工作。(AFAIK 节点没有任何线程/共享内存的概念;它是进程或什么都没有)。您有两种构建应用程序的选项。您可以通过生成 8 个 HTTP 服务器并在子进程上同步处理计算密集型任务来获得 80/20 解决方案。这样做相当简单。您可能需要一个小时才能在该链接上阅读它。事实上,如果您只是撕掉该链接顶部的示例代码,您将获得 95% 的成功。
The other way to structure this is to set up a job queue and send big compute tasks over the queue. Note that there is a lot of overhead associated with the IPC for a job queue, so this is only useful when the tasks are appreciably larger than the overhead.
另一种构建它的方法是设置一个作业队列并通过队列发送大型计算任务。请注意,作业队列的 IPC 会产生大量开销,因此这仅在任务明显大于开销时才有用。
I'm surprised that none of these other answers even mentioncluster.
我很惊讶这些其他答案甚至都没有提到集群。
Background: Asynchronous code is code that suspends until something happens somewhere else, at which point the code wakes up and continues execution. One very common case where something slow must happen somewhere else is I/O.
背景:异步代码是挂起直到其他地方发生某些事情的代码,此时代码唤醒并继续执行。一种非常常见的情况是 I/O 必须在其他地方发生缓慢的事情。
Asynchronous code isn't useful if it's your processorthat is responsible for doing the work. That is precisely the case with "compute intensive" tasks.
如果您的处理器负责完成这项工作,异步代码就没有用处。“计算密集型”任务正是这种情况。
Now, it might seem that asynchronous code is niche, but in fact it's very common. It just happens not to be useful for compute intensive tasks.
现在,异步代码似乎是小众的,但实际上它很常见。它恰好对计算密集型任务没有用。
Waiting on I/O is a pattern that always happens in web servers, for example. Every client who connects to your sever gets a socket. Most of the time the sockets are empty. You don't want to do anything until a socket receives some data, at which point you want to handle the request. Under the hood an HTTP server like Node is using an eventing library (libev) to keep track of the thousands of open sockets. The OS notifies libev, and then libev notifies NodeJS when one of the sockets gets data, and then NodeJS puts an event on the event queue, and your http code kicks in at this point and handles the events one after the other. Events don't get put on the queue until the socket has some data, so events are never waiting on data - it's already there for them.
例如,等待 I/O 是一种经常发生在 Web 服务器中的模式。连接到您的服务器的每个客户端都会获得一个套接字。大多数时候套接字是空的。在套接字接收到一些数据之前,您不想做任何事情,此时您想要处理请求。在幕后,像 Node 这样的 HTTP 服务器正在使用事件库 (libev) 来跟踪数千个打开的套接字。操作系统通知 libev,然后当其中一个套接字获取数据时,libev 通知 NodeJS,然后 NodeJS 将事件放入事件队列,此时您的 http 代码启动并一个接一个处理事件。在套接字有一些数据之前,事件不会被放入队列中,因此事件永远不会等待数据——数据已经存在。
Single threaded event-based web servers makes sense as a paradigm when the bottleneck is waiting on a bunch of mostly empty socket connections and you don't want a whole thread or process for every idle connection and you don't want to poll your 250k sockets to find the next one that has data on it.
当瓶颈正在等待一堆空的套接字连接并且您不希望每个空闲连接都有一个完整的线程或进程并且您不想轮询 250k 时,基于单线程事件的 Web 服务器作为一种范例是有意义的sockets 来查找下一个有数据的套接字。
回答by Toby Hede
Couple of approaches you can use.
您可以使用几种方法。
As @Tim notes, you can create an asynchronous task that sits outside or parallel to your main serving logic. Depends on your exact requirements, but even croncan act as a queueing mechanism.
正如@Tim 所指出的,您可以创建一个位于主服务逻辑之外或与主服务逻辑并行的异步任务。取决于您的确切要求,但即使是cron也可以充当排队机制。
WebWorkers can work for your async processes but they are currently not supported by node.js. There are a couple of extensions that provide support, for example: http://github.com/cramforce/node-worker
WebWorkers 可以为您的异步进程工作,但目前 node.js 不支持它们。有几个扩展提供支持,例如:http: //github.com/cramforce/node-worker
You still get you can still reuse modules and code through the standard "requires" mechanism. You just need to ensure that the initial dispatch to the worker passes all the information needed to process the results.
您仍然可以通过标准的“需要”机制重用模块和代码。您只需要确保对工作人员的初始分派传递了处理结果所需的所有信息。

