启用 TCP_NODELAY 的 Linux 环回性能

Question

提问by rns

I recently stumbled on an interesting TCP performance issue while running some performance tests that compared network performance versus loopback performance. In my case the network performance exceeded the loopback performance (1Gig network, same subnet). In the case I am dealing latencies are crucial, so TCP_NODELAY is enabled. The best theory that we have come up with is that TCP congestion control is holding up packets. We did some packet analysis and we can definitely see that packets are being held, but the reason is not obvious. Now the questions...

我最近在运行一些比较网络性能与环回性能的性能测试时偶然发现了一个有趣的 TCP 性能问题。在我的情况下，网络性能超过了环回性能（1Gig 网络，同一子网）。在我处理延迟的情况下，延迟至关重要，因此启用了 TCP_NODELAY。我们提出的最佳理论是 TCP 拥塞控制阻止了数据包。我们做了一些数据包分析，我们肯定可以看到数据包正在被保持，但原因并不明显。现在问题...

1) In what cases, and why, would communicating over loopback be slower than over the network?

1) 在什么情况下，为什么通过环回通信会比通过网络通信慢？

2) When sending as fast as possible, why does toggling TCP_NODELAY have so much more of an impact on maximum throughput over loopback than over the network?

2) 在尽可能快地发送时，为什么切换 TCP_NODELAY 对环回上的最大吞吐量的影响比网络上的影响大得多？

3) How can we detect and analyze TCP congestion control as a potential explanation for the poor performance?

3) 我们如何检测和分析 TCP 拥塞控制作为性能不佳的潜在解释？

4) Does anyone have any other theories as to the reason for this phenomenon? If yes, any method to prove the theory?

4）有没有人对这种现象的原因有任何其他理论？如果是，有什么方法可以证明这个理论？

Here is some sample data generated by a simple point to point c++ app:

以下是由简单的点对点 C++ 应用程序生成的一些示例数据：

Transport     Message Size (bytes)  TCP NoDelay   Send Buffer (bytes)   Sender Host   Receiver Host   Throughput (bytes/sec)  Message Rate (msgs/sec)
TCP           128                   On            16777216              HostA         HostB           118085994                922546
TCP           128                   Off           16777216              HostA         HostB           118072006                922437
TCP           128                   On                4096              HostA         HostB            11097417                 86698
TCP           128                   Off               4096              HostA         HostB            62441935                487827
TCP           128                   On            16777216              HostA         HostA            20606417                160987
TCP           128                   Off           16777216              HostA         HostA           239580949               1871726
TCP           128                   On                4096              HostA         HostA            18053364                141041
TCP           128                   Off               4096              HostA         HostA           214148304               1673033
UnixStream    128                   -             16777216              HostA         HostA            89215454                696995
UnixDatagram  128                   -             16777216              HostA         HostA            41275468                322464
NamedPipe     128                   -             -                     HostA         HostA            73488749                574130

Here are a few more pieces of useful information:

这里还有一些有用的信息：

I only see this issue with small messages
HostA and HostB both have the same hardware kit (Xeon [email protected], 32 cores total/128 Gig Mem/1Gig Nics)
OS is RHEL 5.4 kernel 2.6.18-164.2.1.el5)

我只看到小消息的这个问题
HostA 和 HostB 都具有相同的硬件套件（至强 [email protected]，总共 32 个内核/128 Gig Mem/1Gig Nics）
操作系统是 RHEL 5.4 内核 2.6.18-164.2.1.el5)

Thank You

谢谢你

Answer 1

回答by stackmate

1 or 2) I'm not sure why you're bothering to use loopback at all, I personally don't know how closely it will mimic a real interface and how valid it will be. I know that Microsoft disables NAGLE for the loopback interface (if you care). Take a look at this link, there's a discussion about this.

1 或 2) 我不知道你为什么要费心使用环回，我个人不知道它会模仿真实界面的程度以及它的有效性。我知道微软为环回接口禁用了 NAGLE（如果你关心的话）。看看这个链接，有一个关于这个的讨论。

3) I would closely look at the first few packets in both cases and see if you're getting a severe delay in the first five packets. See here

3）我会仔细查看这两种情况下的前几个数据包，看看前五个数据包是否有严重的延迟。看这里

Answer 2

回答by csd

1) In what cases, and why, would communicating over loopback be slower than over the network?

1)在什么情况下，为什么通过环回通信会比通过网络通信慢？

Loopback puts the packet setup+tcp chksum calculation for both tx+rx on the same machine, so it needs to do 2x as much processing, while with 2 machines you split the tx/rx between them. This can have negative impact on loopback.

Loopback 将 tx+rx 的数据包 setup+tcp chksum 计算放在同一台机器上，因此它需要进行 2 倍的处理，而对于 2 台机器，您可以在它们之间拆分 tx/rx。这会对环回产生负面影响。

2) When sending as fast as possible, why does togglingTCP_NODELAY have so much more of an impact on maximum throughput over loopback than over the network?

2)在尽可能快地发送时，为什么切换TCP_NODELAY对环回上的最大吞吐量的影响比网络上的影响大得多？

Not sure how you've come to this conclusion, but the loopback vs network are implemented very differently, and if you try to push them to the limit, you will hit different issues. Loopback interfaces (as mentioned in answer to 1) cause tx+rx processing overhead on the same machine. On the other hand, NICs have a # of limits in terms of how many outstanding packets they can have in their circular buffers etc which will cause completely different bottlenecks (and this varies greatly from chip to chip too, and even from the switch that's between them)

不确定你是如何得出这个结论的，但是环回与网络的实现方式非常不同，如果你试图将它们推到极限，你会遇到不同的问题。环回接口（如对 1 的回答中提到的）在同一台机器上导致 tx+rx 处理开销。另一方面，NIC 在循环缓冲区中可以有多少未完成的数据包等方面有一个限制，这将导致完全不同的瓶颈（这也因芯片而异，甚至从中间的交换机也有很大差异）他们）

3) How can we detect and analyze TCP congestion control as a potential explanation for the poor performance?

3)我们如何检测和分析 TCP 拥塞控制作为性能不佳的潜在解释？

Congestion control only kicks in if there is packet loss. Are you seeing packet loss? Otherwise, you're probably hitting limits on the tcp window size vs network latency factors.

拥塞控制仅在出现数据包丢失时才会启动。你看到丢包了吗？否则，您可能会遇到 tcp 窗口大小与网络延迟因素的限制。

4) Does anyone have any other theories as to the reason for this phenomenon? If yes, any method to prove the theory?

4）有没有人对这种现象的原因有任何其他理论？如果是，有什么方法可以证明这个理论？

I don't understand the phenomenon you refer to here. All I see in your table is that you have some sockets with a large send buffer - this can be perfectly legitimate. On a fast machine, your application will certainly be capable of generating more data than the network can pump out, so I'm not sure what you're classifying as a problem here.

我不明白你在这里提到的现象。我在你的表中看到的只是你有一些带有大发送缓冲区的套接字 - 这可能是完全合法的。在一台快速的机器上，您的应用程序肯定能够生成比网络能够输出更多的数据，所以我不确定您在这里将什么归类为问题。

One final note: small messages create a much bigger performance hit on your network for various reasons, such as:

最后一个注意事项：由于各种原因，小消息会对您的网络造成更大的性能影响，例如：

there is a fixed per packet overhead (for mac+ip+tcp headers), and the smaller the payload is, the more overhead you're going to have.
many NIC limitations are relative to the # of outstanding packets, which means you'll hit NIC bottlenecks with much less data when using smaller packets.
the network itself as per-packet overhead, so the max amount of data you can pump through the network is dependent on the size of the packets again.

每个数据包的开销是固定的（对于 mac+ip+tcp 标头），有效负载越小，您将拥有的开销越大。
许多 NIC 限制与未完成数据包的数量有关，这意味着当使用较小的数据包时，您会以更少的数据遇到 NIC 瓶颈。
网络本身作为每个数据包的开销，因此您可以通过网络抽取的最大数据量再次取决于数据包的大小。

Answer 3

回答by Gayanath

The is the same issue I faced,also. When transferring 2 MB of data between two components running in the same RHEL6 machine, it took 7 seconds to complete. When the data size is large, the time is not acceptable. It took 1 min to transfer 10 MB of data.

这也是我面临的同样问题。在运行在同一 RHEL6 机器上的两个组件之间传输 2 MB 数据时，需要 7 秒才能完成。当数据量很大时，时间是不可接受的。传输 10 MB 的数据需要 1 分钟。

Then I have tried with TCP_NODELAYdisabled. It solved the problem

然后我试过 TCP_NODELAY残疾人。它解决了问题

This does not happen when the two components are in two different machines.

当两个组件位于两台不同的机器中时，不会发生这种情况。

启用 TCP_NODELAY 的 Linux 环回性能

提问by rns

回答by stackmate

回答by csd

回答by Gayanath

相关推荐

最近更新

标签

启用 TCP_NODELAY 的 Linux 环回性能

提问by rns

回答by stackmate

回答by csd

回答by Gayanath

相关推荐

C# 通知图标不显示

Linux 在 Ubuntu 上为 mpi.h 设置 G++ 或 ICC

Linux 将参数传递给系统调用

在 C# 中，如何查询 Windows 服务器上正在运行的服务列表？

相关推荐

最近更新

标签