Linux 使用 tcp_tw_recycle 断开连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8893888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dropping of connections with tcp_tw_recycle
提问by user1153755
summary of the problem
问题总结
we are having a setup wherein a lot(800 to 2400 per second( of incoming connections to a linux box and we have a NAT device between the client and server. so there are so many TIME_WAIT sockets left in the system. To overcome that we had set tcp_tw_recycle to 1, but that led to drop of in comming connections. after browsing through the net we did find the references for why the dropping of frames with tcp_tw_recycle and NAT device happens.
我们有一个设置,其中有很多(每秒 800 到 2400 个(到 linux 机器的传入连接),并且我们在客户端和服务器之间有一个 NAT 设备。所以系统中剩下很多 TIME_WAIT 套接字。为了克服这个问题,我们已经将 tcp_tw_recycle 设置为 1,但这导致了来往连接的丢失。在浏览网络后,我们确实找到了有关为什么会发生 tcp_tw_recycle 和 NAT 设备丢失帧的参考资料。
resolution tried
尝试过的解决方案
we then tried by setting tcp_tw_reuse to 1 it worked fine without any issues with the same setup and configuration.
然后我们尝试通过将 tcp_tw_reuse 设置为 1 它工作正常,相同的设置和配置没有任何问题。
But the documentation says that tcp_tw_recycle and tcp_tw_reuse should not be used when the Connections that go through TCP state aware nodes, such as firewalls, NAT devices or load balancers may see dropped frames. The more connections there are, the more likely you will see this issue.
但是文档说当通过 TCP 状态感知节点(例如防火墙、NAT 设备或负载平衡器)的连接可能会看到丢帧时,不应使用 tcp_tw_recycle 和 tcp_tw_reuse。连接越多,您就越有可能看到此问题。
Queries
查询
1) can tcp_tw_reuse be used in this type of scenarios? 2) if not, which part of the linux code is preventing tcp_tw_reuse being used for such scenario? 3) generally what is the difference between tcp_tw_recycle and tcp_tw_reuse?
1) tcp_tw_reuse 可以用在这种类型的场景中吗?2)如果不是,Linux 代码的哪一部分阻止了 tcp_tw_reuse 用于这种情况?3)一般tcp_tw_recycle和tcp_tw_reuse有什么区别?
回答by jpetazzo
By default, when both tcp_tw_reuse
and tcp_tw_recycle
are disabled, the kernel will make sure that sockets in TIME_WAIT
state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
默认情况下,当tcp_tw_reuse
和tcp_tw_recycle
都被禁用时,内核将确保处于TIME_WAIT
状态的套接字将保持在该状态足够长的时间——足够长的时间以确保属于未来连接的数据包不会被误认为是旧连接的延迟数据包。
When you enable tcp_tw_reuse
, sockets in TIME_WAIT
state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps
(a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on bothends (at least, that's my understanding). See the definition of tcp_twsk_uniquefor the gory details.
当您启用时tcp_tw_reuse
,处于TIME_WAIT
状态的套接字可以在它们到期之前使用,并且内核将尝试确保没有关于 TCP 序列号的冲突。如果您启用tcp_timestamps
(又名 PAWS,用于防止包装序列号),它将确保不会发生这些冲突。然而,你需要TCP时间戳上启用两个端(至少,这是我的理解)。有关详细信息,请参阅tcp_twsk_unique的定义。
When you enable tcp_tw_recycle
, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT
state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN
packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_processmight be a good starting point.
当您启用 时tcp_tw_recycle
,内核会变得更加激进,并且会对远程主机使用的时间戳做出假设。它将跟踪每个具有连接TIME_WAIT
状态的远程主机使用的最后一个时间戳),如果时间戳正确增加,则允许重新使用套接字。但是,如果主机使用的时间戳发生变化(即及时回溯),SYN
数据包将被静默丢弃,并且连接将无法建立(您将看到类似于“连接超时”的错误)。如果您想深入了解内核代码,tcp_timewait_state_process的定义可能是一个很好的起点。
Now, timestamps should never go back in time; unless:
现在,时间戳永远不应该回到过去;除非:
- the host is rebooted (but then, by the time it comes back up,
TIME_WAIT
socket will probably have expired, so it will be a non issue); - the IP address is quickly reused by something else (
TIME_WAIT
connections will stay a bit, but other connections will probably be struck byTCP RST
and that will free up some space); - network address translation(or a smarty-pants firewall) is involved in the middle of the connection.
- 主机重新启动(但是,当它恢复时,
TIME_WAIT
套接字可能已经过期,因此这将不是问题); - IP地址很快被其他东西重用(
TIME_WAIT
连接会保留一点,但其他连接可能会被攻击TCP RST
,这会释放一些空间); - 网络地址转换(或 smarty-pants 防火墙)涉及连接中间。
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT
bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
在后一种情况下,您可以在同一个 IP 地址后面有多个主机,因此,不同的时间戳序列(或者,所述时间戳在每个连接上由防火墙随机化)。在这种情况下,一些主机将随机无法连接,因为它们被映射到TIME_WAIT
服务器存储桶具有更新时间戳的端口。这就是为什么文档会告诉您“NAT 设备或负载平衡器可能会因为设置而开始丢帧”的原因。
Some people recommend to leave tcp_tw_recycle
alone, but enable tcp_tw_reuse
and lower tcp_fin_timeout
. I concur :-)
有些人建议不要管tcp_tw_recycle
,但启用tcp_tw_reuse
并降低tcp_fin_timeout
. 我同意 :-)