javascript 在node.js中,如何声明一个可以被master进程初始化并被worker进程访问的共享变量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10965201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In node.js, how to declare a shared variable that can be initialized by master process and accessed by worker processes?
提问by Hymany Lee
I want the following
我想要以下
- During startup, the master process loads a large table from file and saves it into a shared variable. The table has 9 columns and 12 million rows, 432MB in size.
- The worker processes run HTTP server, accepting real-time queries against the large table.
- 在启动期间,主进程从文件加载一个大表并将其保存到共享变量中。该表有 9 列和 1200 万行,大小为 432MB。
- 工作进程运行 HTTP 服务器,接受针对大表的实时查询。
Here is my code, which obviously does not achieve my goal.
这是我的代码,显然没有达到我的目标。
var my_shared_var;
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Load a large table from file and save it into my_shared_var,
// hoping the worker processes can access to this shared variable,
// so that the worker processes do not need to reload the table from file.
// The loading typically takes 15 seconds.
my_shared_var = load('path_to_my_large_table');
// Fork worker processes
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
} else {
// The following line of code actually outputs "undefined".
// It seems each process has its own copy of my_shared_var.
console.log(my_shared_var);
// Then perform query against my_shared_var.
// The query should be performed by worker processes,
// otherwise the master process will become bottleneck
var result = query(my_shared_var);
}
I have tried saving the large table into MongoDB so that each process can easily access to the data. But the table size is so huge that it takes MongoDB about 10 seconds to complete my query even with an index. This is too slow and not acceptable for my real-time application. I have also tried Redis, which holds data in memory. But Redis is a key-value store and my data is a table. I also wrote a C++ program to load the data into memory, and the query took less than 1 second, so I want to emulate this in node.js.
我尝试将大表保存到 MongoDB 中,以便每个进程都可以轻松访问数据。但是表的大小太大了,即使有索引,MongoDB 也需要大约 10 秒才能完成我的查询。这对于我的实时应用程序来说太慢并且无法接受。我也试过 Redis,它在内存中保存数据。但是 Redis 是一个键值存储,而我的数据是一个表。我还写了一个 C++ 程序来将数据加载到内存中,查询耗时不到 1 秒,所以我想在 node.js 中模拟这一点。
采纳答案by Martin Blech
You are looking for shared memory, which node.js just does not support. You should look for alternatives, such as querying a databaseor using memcached.
您正在寻找共享内存,而node.js 只是不支持. 您应该寻找替代方案,例如查询数据库或使用memcached。
回答by Shivam
If I translate your question in a few words, you need to share data of MASTER entity with WORKER entity. It can be done very easily using events:
如果我用几句话来翻译您的问题,您需要与 WORKER 实体共享 MASTER 实体的数据。使用事件可以很容易地完成:
From Master to worker:
从 Master 到工人:
worker.send({json data}); // In Master part
process.on('message', yourCallbackFunc(jsonData)); // In Worker part
From Worker to Master:
从工人到主人:
process.send({json data}); // In Worker part
worker.on('message', yourCallbackFunc(jsonData)); // In Master part
I hope this way you can send and receive data bidirectionally. Please mark it as answer if you find it useful so that other users can also find the answer. Thanks
我希望通过这种方式您可以双向发送和接收数据。如果觉得有用,请将其标记为答案,以便其他用户也可以找到答案。谢谢
回答by Vadim Baryshev
In node.js fork works not like in C++. It's not copy current state of process, it's run new process. So, in this case variables isn't shared. Every line of code works for every process but master process have cluster.isMaster flag set to true. You need to load your data for every worker processes. Be careful if your data is really huge because every process will have its own copy. I think you need to query parts of data as soon as you need them or wait if you realy need it all in memory.
在 node.js 中 fork 的工作方式与 C++ 不同。它不是复制进程的当前状态,而是运行新进程。因此,在这种情况下,变量不共享。每行代码都适用于每个进程,但主进程将 cluster.isMaster 标志设置为 true。您需要为每个工作进程加载数据。如果您的数据真的很大,请小心,因为每个进程都有自己的副本。我认为您需要在需要时立即查询部分数据,或者如果您真的需要将它们全部保存在内存中,则需要等待。
回答by Allen Luce
If read-only access is fine for your application, try out my own shared memory module. It uses mmap
under the covers, so data is loaded as it's accessed and not all at once. The memory is shared among all processes on the machine. Using it is super easy:
如果只读访问适合您的应用程序,请尝试我自己的共享内存模块。它mmap
在幕后使用,因此数据是在访问时加载的,而不是一次加载。内存在机器上的所有进程之间共享。使用它非常简单:
const Shared = require('mmap-object')
const shared_object = new Shared.Open('table_file')
console.log(shared_object.property)
It gives you a regular object interface to a key-value store of strings or numbers. It's super fast in my applications.
它为您提供了一个到字符串或数字键值存储的常规对象接口。它在我的应用程序中非常快。
There is also an experimental read-write version of the moduleavailable for testing.
回答by Reza Roshan
You can use Redis.
你可以使用Redis。
Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.
Redis 是一个开源、BSD 许可的高级键值缓存和存储。它通常被称为数据结构服务器,因为键可以包含字符串、散列、列表、集合、排序集合、位图和超级日志。
redis.io
Redis.io