javascript 与Node.js的长连接,如何减少内存使用,防止内存泄漏?还与 V8 和 webkit-devtools 相关

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14049109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 20:34:50  来源:igfitidea点击:

Long connections with Node.js, how to reduce memory usage and prevent memory leak? Also related with V8 and webkit-devtools

javascriptlinuxnode.jssocketstcp

提问by Aaron Wang

Here is what I'm trying to do: I'm developing a Node.js http server, which will hold long connections for pushing purpose(collaborate with redis) from tens of thousands of mobile clients in a single machine.

这是我正在尝试做的事情:我正在开发一个 Node.js http 服务器,它将在一台机器上保持来自数万个移动客户端的长连接以用于推送目的(与 redis 协作)。

Test environment:

测试环境:

1.80GHz*2 CPU/2GB RAM/Unbuntu12.04/Node.js 0.8.16

At the first time, I used "express" module, with which I could reach about 120k concurrent connections before swap being used which means the RAM is not enough. Then, I switched to native "http" module, I got the concurrency up to about 160k. But I realized that there are still too many functionality I don't need in native http module, so I switched it to native "net" module(this means I need to handle http protocol by myself, but that's ok). now, I can reach about 250k concurrent connections per single machine.

第一次,我使用了“express”模块,在使用交换之前我可以达到大约 120k 并发连接,这意味着 RAM 是不够的。然后,我切换到原生的“http”模块,我得到了大约 160k 的并发。但是我意识到原生http模块还有太多我不需要的功能,所以我把它切换到原生的“net”模块(这意味着我需要自己处理http协议,但这没关系)。现在,我可以达到每台机器大约 250k 的并发连接。

Here is the main structure of my codes:

这是我的代码的主要结构:

var net = require('net');
var redis = require('redis');

var pendingClients = {};

var redisClient = redis.createClient(26379, 'localhost');
redisClient.on('message', function (channel, message) {
    var client = pendingClients[channel];
    if (client) {
        client.res.write(message);
    }
});

var server = net.createServer(function (socket) {
    var buffer = '';
    socket.setEncoding('utf-8');
    socket.on('data', onData);

    function onData(chunk) {
        buffer += chunk;
        // Parse request data.
        // ...

        if ('I have got all I need') {
            socket.removeListener('data', onData);

            var req = {
                clientId: 'whatever'
            };
            var res = new ServerResponse(socket);
            server.emit('request', req, res);
        }  
    }
});

server.on('request', function (req, res) {
    if (res.socket.destroyed) {            
        return;
    }

    pendingClinets[req.clientId] = {
        res: res
    };

    redisClient.subscribe(req.clientId);

    res.socket.on('error', function (err) {
        console.log(err);
    });

    res.socket.on('close', function () {
        delete pendingClients[req.clientId];

        redisClient.unsubscribe(req.clientId);
    });
});

server.listen(3000);

function ServerResponse(socket) {
    this.socket = socket;
}
ServerResponse.prototype.write = function(data) {
    this.socket.write(data);
}

Finally, here are my questions:

最后,这里是我的问题:

  1. How can I reduce the memory usage so that increase the concurrency farther?

  2. I'm really confused about how to calculate the memory usage of Node.js process. I know Node.js powered by Chrome V8, there is process.memoryUsage()api and it return three values: rss/heapTotal/heapUsed, what's the difference between them, which part should I concern more, and what's the exactly composition of the memory used by the Node.js process?

  3. I worried about memory leak even though I have done some tests and there don't seem to be a problem. Are there any points I should concern or any advises?

  4. I found a doc about V8 hidden class, as it described, does that mean whenever I add a property named by clientIdto my global object pendingClientsjust like my codes above, there will be a new hidden class be generated? Dose it will cause memory leak?

  5. I used webkit-devtools-agentto analyze heap map of the Node.js process. I started the process and took a heap snapshot, then I sent 10k requests to it and disconnected them later, after that I took a heap snapshot again. I used the comparisonperspective to see the difference between these two snapshots. Here is what I got: enter image description hereCould anyone explain this? The number and size of (array)/(compiled code)/(string)/Command/Array increased a lot, what does this mean?

  1. 如何减少内存使用量以进一步提高并发性?

  2. 我真的很困惑如何计算 Node.js 进程的内存使用情况。我知道由 Chrome V8 提供支持的 Node.js,有process.memoryUsage()api,它返回三个值:rss/heapTotal/heapUsed,它们之间有什么区别,我应该更关注哪一部分,以及它们的确切组成是什么Node.js 进程使用的内存?

  3. 我担心内存泄漏,即使我已经做了一些测试并且似乎没有问题。有什么我应该关注的地方或任何建议吗?

  4. 我找到了一个关于V8 hidden class的文档,正如它所描述的,这是否意味着每当我像上面的代码一样将一个由clientId命名的属性添加到我的全局对象pendingClients 时,就会生成一个新的隐藏类?它会导致内存泄漏吗?

  5. 我使用webkit-devtools-agent来分析 Node.js 进程的堆图。我开始了这个过程并拍摄了堆快照,然后我向它发送了 10k 请求并稍后断开它们,之后我再次拍摄了堆快照。我使用比较的角度来查看这两个快照之间的差异。这是我得到的: 在此处输入图片说明谁能解释一下?(array)/(compiled code)/(string)/Command/Array的数量和大小增加了很多,这是什么意思?

EDIT: How did I run the loading test?
1. Firstly, I modified some parameters both on server machine and client machines(to achieve more than 60k concurrency need more than one client machine, because one machine only have 60k+ ports(represented by 16 bit) at most)
1.1. Both one the server and the client machines, I modified the file descriptor use these commands in the shell where the test program will be run in:

编辑:我是如何运行加载测试的?
1.首先修改了服务器端和客户端的一些参数(实现60k以上的并发需要多台客户端,因为一台机器最多只有60k+端口(16位表示))
1.1. 服务器和客户端机器,我修改了文件描述符,在测试程序将在其中运行的 shell 中使用这些命令:

ulimit -Hn 999999
ulimit -Sn 999999

1.2. On the server machine, I also modified some net/tcp related kernel parameters, the most important ones are:

1.2. 在服务器机器上,我也修改了一些net/tcp相关的内核参数,最重要的是:

net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432

1.3. As to the client machines:

1.3. 至于客户端机器:

net.ipv4.ip_local_port_range = 1024 65535

2. Secondly, I wrote a custom simulate client program using Node.js, since most load test tools, ab, siege, etc, are for short connections, but I'm using long connections and have some special requirements.
3. Then I started the server program on a single machine, and three client program on the other three separated machines.

2. 其次,我用Node.js写了一个自定义的模拟客户端程序,因为大部分负载测试工具ab,siege等都是针对短连接的,但是我用的是长连接,有一些特殊的要求。
3.然后我在一台机器上启动了服务器程序,另外三台分开的机器上启动了三个客户端程序。

EDIT: I did reach 250k concurrent connections on a single machine(2GB RAM), but turned out, it's not very meaningful and practical. Because when a connection connected, I just let the connection pending, nothing else. When I tried to sent response to them, the concurrency number dropped down to 150k around. As I calculated, there is about 4KB more memory usage per connection, I guess it's related to net.ipv4.tcp_wmemwhich I set to 4096 16384 33554432, but even I modified it to smaller, nothing changed. I can't figure out why.

编辑:我确实在一台机器(2GB RAM)上达到了 250k 并发连接,但结果证明,这不是很有意义和实用。因为当连接连接时,我只是让连接挂起,没有别的。当我尝试向他们发送响应时,并发数下降到 150k 左右。根据我的计算,每个连接的内存使用量增加了大约 4KB,我猜这与net.ipv4.tcp_wmem相关,我将其设置为4096 16384 33554432,但即使我将其修改为更小,也没有任何改变。我不明白为什么。

EDIT: Actually, now I'm more interested in how much memory per tcp connection uses and what's the exactly composition of the memory used by a single connection? According to my test data:

编辑:实际上,现在我更感兴趣的是每个 tcp 连接使用多少内存以及单个连接使用的内存的确切组成是什么?根据我的测试数据:

150k concurrency consumed about 1800M RAM(from free -moutput), and the Node.js process had about 600M RSS

150k 并发消耗了大约 1800M RAM(来自free -m输出),Node.js 进程有大约 600M RSS

Then, I assumed this:

然后,我假设:

  • (1800M - 600M) / 150k = 8k, this is the kernel TCP stack memory usage of a single connection, it consists of two parts: read buffer(4KB) + write buffer(4KB)(Actually, this doesn't match my setting of net.ipv4.tcp_rmemand net.ipv4.tcp_wmemabove, how does the system determine how much memory to use for these buffers?)

  • 600M / 150k = 4k, this is the Node.js memory usage of a single connection

  • (1800M - 600M) / 150k = 8k,这是单个连接的内核TCP栈内存使用量,它由两部分组成:读缓冲区(4KB) + 写缓冲区(4KB)(其实这和我的设置不符上面的net.ipv4.tcp_rmemnet.ipv4.tcp_wmem中,系统如何确定这些缓冲区使用多少内存?)

  • 600M / 150k = 4k,这是单个连接的Node.js内存使用量

Am I right? How can I reduce the memory usage in both aspects?

我对吗?如何在两个方面减少内存使用?

If there are anywhere I didn't describe well, let me know, I'll refine it! Any explanations or advises will be appreciated, thanks!

如果有哪里我描述的不好,请告诉我,我会改进的!任何解释或建议将不胜感激,谢谢!

回答by heartpunk

  1. I think you shouldn't worry about further decreasing memory usage. From that readout you included, it seems you're pretty close to the bare minimum conceivable (I interpret it as being in bytes, which is standard when a unit isn't specified).

  2. This is a more in depth question than I can answer, but here's what RSS. The heap is where dynamically allocated memory comes from in unix systems, as best I understand. So, the heap total seems like it'd be all that is allocated on the heap for your usage, whereas the heap used is how much of what's allocated you've used.

  3. Your memory usage is quite good, and it doesn't seem you actually have a leak. I wouldn't worry yet. =]

  4. Don't know.

  5. This snapshot seems reasonable. I expect some of the objects created from the surge of requests had been garbage collected, and others hadn't. You see there's nothing over 10k objects, and most of these objects are quite small. I call that good.

  1. 我认为您不必担心进一步减少内存使用量。从您包含的读数来看,您似乎非常接近可以想象的最低限度(我将其解释为以字节为单位,这是未指定单位时的标准)。

  2. 这是一个比我能回答的更深入的问题,但这是RSS。据我所知,堆是 unix 系统中动态分配内存的来源。因此,堆总数似乎是在堆上为您的使用分配的所有内容,而使用的堆是您已使用的分配量。

  3. 你的内存使用情况很好,看起来你实际上没有泄漏。我还不会担心。=]

  4. 不知道。

  5. 这个快照似乎合理。我预计从大量请求中创建的一些对象已经被垃圾收集了,而另一些则没有。您会看到没有超过 10k 个对象,而且这些对象中的大多数都非常小。我称之为好。

More importantly, though, I wonder how you're load testing this. I've tried to do massive load testing like this before, and most tools simply can't manage to generate that kind of load on linux, because of the limits on the number of open file descriptors (generally around a thousand per process by default). As well, once a socket is used, it is not immediately available for use again. It takes some significant fraction of a minute, as I recall, to be usable again. Between this and the fact that I've normally seen the system wide open file descriptor limit set somewhere under 100k, I'm not sure it's possible to receive that much load on an unmodified box, or to generate it on a single box. Since you don't mention any such steps, I think you might also need to investigate your load testing, to make sure it's doing what you think.

不过,更重要的是,我想知道您如何对此进行负载测试。我以前尝试过像这样进行大规模的负载测试,由于打开文件描述符的数量限制(默认情况下每个进程通常大约一千个),大多数工具根本无法在 linux 上生成这种负载)。同样,一旦使用了套接字,就不能立即再次使用它。我记得,它需要很短的一分钟时间才能再次使用。在这与我通常看到系统范围打开的文件描述符限制设置在 100k 以下的事实之间,我不确定是否有可能在未修改的盒子上接收那么多负载,或者在单个盒子上生成它。由于您没有提到任何此类步骤,我认为您可能还需要调查您的负载测试,以确保它”

回答by Louis Ricci

Just a few notes:

只是一些注意事项:

Do you need to wrap res in an object {res: res} can you just assign it directly

是否需要将 res 包裹在一个对象中 {res: res} 可以直接赋值吗

pendingClinets[req.clientId] = res;

EDITanother ~micro optimization that might help

编辑另一个可能有帮助的~微优化

server.emit('request', req, res);

passes two arguments to 'request', but your request handler really only needs the response 'res'.

将两个参数传递给“请求”,但您的请求处理程序实际上只需要响应“res”。

res['clientId'] = 'whatever';
server.emit('request', res);

while your amount of actual data remains the same, having 1 less argument in the 'request' handlers arguments list will save you a reference pointer (a few bytes). But a few bytes when you are processing hundreds of thousands of connections can add up. You'll also save the minor cpu overhead of processing the extra argument on the emit call.

虽然您的实际数据量保持不变,但在“请求”处理程序参数列表中少 1 个参数将为您节省一个引用指针(几个字节)。但是当您处理数十万个连接时,几个字节可以累加。您还将节省处理发出调用中的额外参数的少量 CPU 开销。