Windows 应用程序上的不规则套接字错误 (10054)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10997221/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Irregular socket errors (10054) on Windows application
提问by Giorgio
I am working on a Windows (Microsoft Visual C++ 2005) application that uses several processes running on different hosts in an intranet.
我正在开发一个 Windows (Microsoft Visual C++ 2005) 应用程序,该应用程序使用在 Intranet 中不同主机上运行的多个进程。
Processes communicate with each other using TCP/IP. Different processes can be on the same host or on different hosts (i.e. the communication can be both within the same host or between different hosts).
进程之间使用 TCP/IP 进行通信。不同的进程可以在同一台主机上,也可以在不同的主机上(即通信可以在同一台主机内,也可以在不同主机之间)。
We have currently a bug that appears irregularly. The communication seems to work for a while, then it stops working. Then it works again for some time.
我们目前有一个不规则出现的错误。通信似乎工作了一段时间,然后停止工作。然后它再次工作一段时间。
When the communication does not work, we get an error (apparently while a process was trying to send data). The call looks like this:
当通信不起作用时,我们会收到错误消息(显然是在进程尝试发送数据时)。调用如下所示:
send(socket, (char *) data, (int) data_size, 0);
By inspecting the error code we get from
通过检查我们得到的错误代码
WSAGetLastError()
we see that it is an error 10054. Here is what I found in the Microsoft documentation (see here):
我们看到它是一个错误 10054。这是我在 Microsoft 文档中找到的内容(请参阅此处):
WSAECONNRESET
10054
Connection reset by peer.
An existing connection was forcibly closed by the remote host. This normally
results if the peer application on the remote host is suddenly stopped, the
host is rebooted, the host or remote network interface is disabled, or the
remote host uses a hard close (see setsockopt for more information on the
SO_LINGER option on the remote socket). This error may also result if a
connection was broken due to keep-alive activity detecting a failure while
one or more operations are in progress. Operations that were in progress
fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.
So, as far as I understand, the connection was interrupted by the receiving process. In some cases this error is (AFAIK) correct: one process has terminated and is therefore not reachable. In other cases both the sender and receiver are running and logging activity, but they cannot communicate due to the above error (the error is reported in the logs).
所以,据我所知,连接被接收过程中断了。在某些情况下,此错误 (AFAIK) 是正确的:一个进程已终止,因此无法访问。在其他情况下,发送方和接收方都在运行并记录活动,但由于上述错误而无法通信(错误在日志中报告)。
My questions.
我的问题。
- What does the SO_LINGER option mean?
- What is a keep-alive activity and how can it break a connection?
- How is it possible to avoid this problem or recover from it?
- SO_LINGER 选项是什么意思?
- 什么是保活活动,它如何断开连接?
- 如何避免这个问题或从中恢复?
Regarding the last question. The first solution we tried (actually, it is rather a workaround) was resending the message when the error occurs. Unfortunately, the same error occurs over and over again for a while (a few minutes). So this is not a solution.
关于最后一个问题。我们尝试的第一个解决方案(实际上,它是一种解决方法)是在发生错误时重新发送消息。不幸的是,同样的错误一遍又一遍地发生一段时间(几分钟)。所以这不是一个解决方案。
At the moment we do not understand if we have a software problem or a configuration issue: maybe we should check something in the windows registry?
目前我们不知道我们是否有软件问题或配置问题:也许我们应该检查 Windows 注册表中的某些内容?
One hypothesis was that the OS runs out of ephemeral ports (in case connections are closed but ports are not released because of TcpTimedWaitDelay), but by analyzing this issue we think that there should be plenty of them: the problem occurs even if messages are not sent too frequently between processes. However, we still are not 100% sure that we can exclude this: can ephemeral ports get lost in some way (???)
一种假设是操作系统用完了临时端口(以防由于 TcpTimedWaitDelay 连接关闭但端口没有释放),但通过分析这个问题,我们认为应该有很多:即使没有消息也会出现问题在进程之间发送过于频繁。但是,我们仍然不能 100% 确定我们可以排除这种情况:临时端口会以某种方式丢失吗 (???)
Another detail that might help is that sending and receiving occurs in each process concurrently in separate threads: are there any shared data structures in the TCP/IP libraries that might get corrupted?
另一个可能有帮助的细节是,发送和接收在每个进程中同时发生在单独的线程中:TCP/IP 库中是否有任何可能被破坏的共享数据结构?
What is also very strange is that the problem occurs irregularly: communication works OK for a few minutes, then it does not work for a few minutes, then it works again.
同样很奇怪的是,问题出现不规律:通信正常几分钟,然后几分钟不工作,然后又开始工作。
Thank you for any ideas and suggestions.
感谢您的任何想法和建议。
EDIT
编辑
Thanks for the hints confirming that the only possible explanation was a connection closed error. By further analysis of the problem, we found out that the server-side process of the connection had crashed / had been terminated and had been restarted. So there was a new server process running and listening on the correct port, but the client had not detected this and was still trying to use the old connection. We now have a mechanism to detect such situations and reset the connection on the client side.
感谢您确认唯一可能的解释是连接关闭错误的提示。通过对问题的进一步分析,我们发现连接的服务器端进程已经崩溃/已终止并已重新启动。所以有一个新的服务器进程在正确的端口上运行和侦听,但客户端没有检测到这一点,并且仍在尝试使用旧连接。我们现在有一种机制来检测这种情况并在客户端重置连接。
采纳答案by rekire
That error means that the connection was closed by the remote site. So you cannot do anything on your programm except to accept that the connection is broken.
该错误意味着连接已被远程站点关闭。所以你不能对你的程序做任何事情,只能接受连接中断。
回答by Alexander Galkin
I was facing this problem for some days recently and found out that Adobe Acrobat Reader update was the culprit. As soon as you completely uninstall Adobe from the system everything returns back to normal.
我最近几天一直面临这个问题,并发现 Adobe Acrobat Reader 更新是罪魁祸首。一旦您从系统中完全卸载 Adobe,一切都会恢复正常。