windows Winsock tcp/ip 套接字侦听但连接被拒绝,竞争条件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2706466/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Winsock tcp/ip Socket listening but connection refused, race condition?
提问by Wayne
This involves two automated unit tests which each start up a tcp/ip server that creates a non-blocking socket then bind()s and listen()s in a loop on select() for a client that connects and downloads some data.
这涉及两个自动化单元测试,每个测试都会启动一个 tcp/ip 服务器,该服务器创建一个非阻塞套接字,然后在 select() 上的循环中为连接和下载一些数据的客户端 bind()s 和 listen()s。
The catch is that they work perfectly when run separately but when run as a test suite, the second test client will fail to connect with WSACONNREFUSED...
问题是它们在单独运行时可以完美运行,但是当作为测试套件运行时,第二个测试客户端将无法与 WSACONNREFUSED 连接...
UNLESS
除非
there is a Thread.Sleep() of several seconds between them??!!!
他们之间有几秒钟的 Thread.Sleep() ??!!!
Interestingly, there is retry loop every 1 second for connecting after any failure. So the second test loops for a while until timeout after 10 minutes.
有趣的是,任何失败后,连接都会每 1 秒重试一次循环。所以第二个测试会循环一段时间,直到 10 分钟后超时。
During that time, netstat -na shows the correct port number is in the LISTEN state for the server socket. So if it is in the listen state? Why won't it accept the connection?
在此期间,netstat -na 显示服务器套接字的正确端口号处于 LISTEN 状态。那么如果它处于监听状态呢?为什么它不接受连接?
In the code, there are log messages that show the select NEVER even gets a socket ready to read (which means ready to accept a connection when it applies to a listening socket).
在代码中,有日志消息显示 select NEVER 甚至没有准备好读取的套接字(这意味着当它应用于侦听套接字时准备接受连接)。
Obviously the problem must be related to some race condition between finishing one test which means close() and shutdown() on each end of the socket, and the start up of the next.
显然,问题必须与完成一个测试之间的某种竞争条件有关,这意味着在套接字的每一端都有 close() 和 shutdown(),以及下一个测试的启动。
This wouldn't be so bad if the retry logic allowed it to connect eventually after a couple of seconds. However it seems to get "gummed up" and won't even retry.
如果重试逻辑允许它在几秒钟后最终连接,这不会那么糟糕。然而,它似乎变得“糊涂了”,甚至不会重试。
However, for some strange reason the listening socket SAYS it's in the LISTEN state even through keeps refusing connections.
但是,由于某种奇怪的原因,即使一直拒绝连接,侦听套接字也说它处于 LISTEN 状态。
So that means it's the Windoze O/S which is actually catching the SYN packet and returning a RST packet (which means "Connection Refused").
所以这意味着实际上是 Windoze O/S 正在捕获 SYN 数据包并返回一个 RST 数据包(这意味着“连接被拒绝”)。
The only other time I ever saw this error was when the code had a problem that caused hundreds of sockets to get stuck in TIME_WAIT state. But that's not the case here. netstat shows only about a dozen sockets with only 1 or 2 in TIME_WAIT at any given moment.
我唯一一次看到这个错误是当代码出现问题导致数百个套接字卡在 TIME_WAIT 状态时。但这里的情况并非如此。netstat 在任何给定时刻只显示大约 12 个套接字,在 TIME_WAIT 中只有 1 或 2 个。
Please help.
请帮忙。
采纳答案by Wayne
The fundamental problem was then in closing the socket, a thread was trying to read any remaining bytes. That was done as a separate thread which holds the read end of the socket open for a fixed time of milliseconds while trying repeatedly to read any data.
根本问题是在关闭套接字时,一个线程试图读取任何剩余的字节。这是作为一个单独的线程完成的,它在重复尝试读取任何数据的同时,将套接字的读取端保持打开一个固定的毫秒时间。
That logic has been replaced to more intelligently read any data and close properly when the read returns 0. So it closed much more rapidly.
该逻辑已被替换为更智能地读取任何数据并在读取返回 0 时正确关闭。因此它关闭得更快。
So it turned out to be improper closing of the socket in my own code.
所以结果证明是我自己的代码不正确关闭了套接字。
Thanks for all the help!
感谢所有的帮助!
回答by Len Holgate
I run lots of tests like this across build machines with various Windows operating systems (XP through Windows 7) with various numbers of cores and I've never seen it be a problem.
我在具有各种内核的各种 Windows 操作系统(XP 到 Windows 7)的构建机器上运行了大量这样的测试,我从来没有发现这是一个问题。
I don't believe that the listen socket transitioning to TIME_WAIT
is likely to be your problem; I've certainly never seen it and I regularly run client server tests with the same port where I start and stop servers within the TIME_WAIT
delay period.
我不认为转换到的监听套接字TIME_WAIT
可能是您的问题;我当然从未见过它,我经常使用相同的端口运行客户端服务器测试,在TIME_WAIT
延迟期内我启动和停止服务器。
If you were starting your second server before your first had closed its socket (or, if the socket were in TIME_WAIT
) then I'd expect your second server to get an error when you attempted to bind()
.).
如果你开始你的第二个服务器之前,你首先必须关闭其插座(或者,如果插座均TIME_WAIT
),那么我希望当您尝试你的第二个服务器,以获得一个错误bind()
)。
Personally I think it's more likely that there's an issue in the code that you have that's accepting connections - that is your test might have found a bug ;)
就我个人而言,我认为您接受连接的代码中更有可能存在问题 - 那是您的测试可能发现了错误;)
Can we have a look at the code between your listen and the accept loop?
我们可以看看你的听和接受循环之间的代码吗?
Do you have the problem if you reverse the order of the tests?
如果您颠倒测试顺序,您会遇到问题吗?
Are the client and server running on the same machine, does it change things if they aren't?
客户端和服务器是否在同一台机器上运行,如果不是,它会改变吗?
Etc.
等等。
I have some TCP test tools http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html, if you set up your test system to run the test client from that link against an example server from this one http://www.lenholgate.com/blog/2005/11/simple-echo-servers.htmldo you still see your problem? (That is, run my server with my client in your test system so that it runs it the same as it runs your stuff and does my stuff work?).
我有一些 TCP 测试工具http://www.lenholgate.com/blog/2005/11/windows-tcpip-server-performance.html,如果您将测试系统设置为从该链接针对示例运行测试客户端服务器来自这个http://www.lenholgate.com/blog/2005/11/simple-echo-servers.html你还看到你的问题吗?(也就是说,在您的测试系统中使用我的客户端运行我的服务器,以便它像运行您的东西一样运行它并且我的东西是否工作?)。
回答by Romain Hippeau
From This MSDN site:
The TIME_WAIT state determines the time that must elapse before TCP can release a closed connection and reuse its resources. This interval between closure and release is known as the TIME_WAIT state or 2MSL state. During this time, the connection can be reopened at much less cost to the client and server than establishing a new connection. The TIME_WAIT behavior is specified in RFC 793 which requires that TCP maintains a closed connection for an interval at least equal to twice the maximum segment lifetime (MSL) of the network. When a connection is released, its socket pair and internal resources used for the socket can be used to support another connection.
Windows TCP reverts to a TIME_WAIT state subsequent to the closing of a connection. While in the TIME_WAIT state, a socket pair cannot be re-used. The TIME_WAIT period is configurable by modifying the following DWORD registry setting that represents the TIME_WAIT period in seconds.
TIME_WAIT 状态决定了 TCP 可以释放关闭的连接并重用其资源之前必须经过的时间。关闭和释放之间的间隔称为 TIME_WAIT 状态或 2MSL 状态。在此期间,客户端和服务器可以以比建立新连接少得多的成本重新打开连接。TIME_WAIT 行为在 RFC 793 中指定,它要求 TCP 在至少等于网络最大段生存期 (MSL) 两倍的时间间隔内保持关闭的连接。当一个连接被释放时,它的套接字对和用于该套接字的内部资源可用于支持另一个连接。
Windows TCP 在连接关闭后恢复到 TIME_WAIT 状态。在 TIME_WAIT 状态下,不能重复使用套接字对。TIME_WAIT 时间段可通过修改以下表示 TIME_WAIT 时间段(以秒为单位)的 DWORD 注册表设置进行配置。
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\TCPIP\Parameters\TcpTimedWaitDelay
By default, the MSL is defined to be 120 seconds. The TcpTimedWaitDelay registry setting defaults to a value 240 seconds, which represents 2 times the maximum segment lifetime of 120 seconds or 4 minutes. However, you can use this entry to customize the interval. Reducing the value of this entry allows TCP to release closed connections faster, providing more resources for new connections. However, if the value is too low, TCP might release connection resources before the connection is complete, requiring the server to use additional resources to re-establish the connection. This registry setting can be set from 0 to 300 seconds.
默认情况下,MSL 定义为 120 秒。TcpTimedWaitDelay 注册表设置默认值为 240 秒,它表示 120 秒或 4 分钟的最大段生存期的 2 倍。但是,您可以使用此条目来自定义时间间隔。减少这个条目的值可以让 TCP 更快地释放关闭的连接,为新的连接提供更多的资源。但是,如果该值太低,TCP 可能会在连接完成之前释放连接资源,从而需要服务器使用额外的资源重新建立连接。此注册表设置可以设置为 0 到 300 秒。
I think the minimum you can set the value to is 30 (try smaller but it might not work)
我认为您可以将值设置为 30 的最小值(尝试更小但可能不起作用)
You can look at Winsock Programmer's FAQfor a more detailed explanation.
您可以查看Winsock Programmer's FAQ以获得更详细的解释。