Java 什么可能导致套接字连接异常:连接超时?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3877572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What could cause socket ConnectException: Connection timed out?
提问by ColinD
We have a Webstart client that communicates to the server by sending serialized objects over HTTPS using java.net.HttpsURLConnection
.
我们有一个 Webstart 客户端,它通过使用 HTTPS 通过 HTTPS 发送序列化对象与服务器进行通信java.net.HttpsURLConnection
。
Everything works perfectly fine on my local machine and on test servers located in our office, but I'm experiencing a very, very strange issue which is only occurring on our production and staging servers (and sporadically at that). The main difference I know of between those servers and the ones in our office is that they are located elsewhere and client-server communication with them is considerably slower, but it worked fine for a long time in production prior to this as well.
在我的本地机器和我们办公室的测试服务器上一切正常,但我遇到了一个非常非常奇怪的问题,这个问题只发生在我们的生产和临时服务器上(偶尔会出现)。我所知道的这些服务器与我们办公室中的服务器之间的主要区别在于,它们位于其他地方,与它们的客户端-服务器通信速度要慢得多,但在此之前的很长一段时间内,它在生产中也能正常工作。
Anyway, here's what's happening:
无论如何,这就是正在发生的事情:
- The client, after setting options such as read timeout and properties such as
Content-Type
on theHttpURLConnection
, callsgetOutputStream()
on it to get the stream to write to. - At this point, from what I can tell, the client hangs for some period of time.
- The client then throws the following exception:
- 在客户端,设置选项,如读取超时和如属性后
Content-Type
的HttpURLConnection
,调用getOutputStream()
它来获取数据流写入。 - 在这一点上,据我所知,客户端挂了一段时间。
- 然后客户端抛出以下异常:
java.net.ConnectException: Connection timed out: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(Unknown Source) at java.net.PlainSocketImpl.connectToAddress(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(Unknown Source) at com.sun.net.ssl.internal.ssl.BaseSSLSocketImpl.connect(Unknown Source) at sun.net.NetworkClient.doConnect(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.protocol.https.HttpsClient.(Unknown Source) at sun.net.www.protocol.https.HttpsClient.New(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Unknown Source)
Note that this is not a SocketTimeoutException
, which the connect()
method on HttpURLConnection
says it throws if the timeout expires before a connection can be established. Also, when this happens I am able to call conn.getResponseCode()
and I get a response code of 200.
请注意,这不是 a SocketTimeoutException
,如果在建立连接之前超时到期,则该connect()
方法HttpURLConnection
会抛出该方法。此外,当发生这种情况时,我可以拨打电话conn.getResponseCode()
并收到 200 的响应代码。
- On the server side, an
EOFException
is thrown inObjectInputStream
's constructor, which tries to read the serialization header but fails because the client never gets theOutputStream
to write to.
- 在服务器端, an
EOFException
被抛出到ObjectInputStream
的构造函数中,该构造函数尝试读取序列化标头但失败,因为客户端从未获得OutputStream
要写入的 。
In case it helps, here are the calls being made on the HttpsURLConnection
prior to the call to getOutputStream()
(edited to show only the calls being made rather than the whole structure of the code doing this):
如果有帮助,这里是在调用HttpsURLConnection
之前进行的调用getOutputStream()
(编辑为仅显示正在进行的调用而不是执行此操作的代码的整个结构):
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
conn.setUseCaches(false);
conn.setReadTimeout(30000);
conn.setRequestProperty("Cookie", cookie);
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type", "application/x-java-serialized-object");
conn.getOutputStream();
The thing is, I have no idea how any of this could be happening, especially given that it only happens occasionally(no clear pattern of activity that I can tell) and even then only when there's (relatively) high latency between the client and the server.
问题是,我不知道这一切是怎么发生的,特别是考虑到它只是偶尔发生(我无法分辨出明确的活动模式),即使如此,也只有在客户端和服务器。
Given what I've been able to find so far about java.net.ConnectException: Connect timed out
, I wondered if it weren't some network or firewall issue on the network our servers are running on... but that doesn't make much sense to me given that the request is clearly getting through to the servlet. Also, other apps running on the same network have not reported similar issues.
鉴于到目前为止我能找到的关于java.net.ConnectException: Connect timed out
,我想知道这是否不是我们的服务器正在运行的网络上的某些网络或防火墙问题......但考虑到请求,这对我来说没有多大意义显然是通过 servlet。此外,在同一网络上运行的其他应用程序也没有报告类似的问题。
Does anyone have any idea what the cause of this could be, or even what I should investigate?
有谁知道这可能是什么原因,甚至我应该调查什么?
采纳答案by JoseK
We have come across these in a similar case to yours. Usually at high load and not easy to reproduce on test. Have not fixed it yet but this is the steps we went through.
我们在与您类似的情况下遇到过这些问题。通常在高负载下并且不容易在测试中重现。还没有修复它,但这是我们经历的步骤。
If it's a firewall issue, we would get a Connection Refused or the SocketTimeout exception.
如果是防火墙问题,我们会收到 Connection Refused 或 SocketTimeout 异常。
1) Are you able to track these requests in the access log on the server - do they show an HTTP status 200 or 404 or something else? In our case, the server (IIS in this case) logs showed the client closed the connection and not the server. So that was a mystery.
1) 您是否能够在服务器上的访问日志中跟踪这些请求 - 它们是否显示 HTTP 状态 200 或 404 或其他内容?在我们的例子中,服务器(在本例中为 IIS)日志显示客户端关闭了连接,而不是服务器。所以这是一个谜。
Update:If the client always gets a 200, then the server has actually sent back some response but I suspect the response byte-size (if this is recorded in the access logs) will show a different value from that of the normal response sizefor that request.
更新:如果客户总是得到一个200,那么服务器实际上已经发回了一些回应,但我怀疑的响应字节大小(如果这是记录在访问日志)将与正常响应大小的显示出不同的价值为那个请求。
If it shows the same size of response, then you have a (may be not plausible) condition that the server actually responded correctlybut the client did not get the response back because the connection terminated somewhere in between.
如果它显示相同大小的响应,那么您有一个(可能不合理的)条件,即服务器实际响应正确,但客户端没有得到响应,因为连接在两者之间的某个地方终止。
2) The network admin teams looked at the TCP/IP traffic to determine which end (or intermediate router) is terminating the HTTP / TCP-IP conversation. And once we understand which end is terminating the connection is to look at why. Someone knowledgable enough could run snoop
2) 网络管理团队查看 TCP/IP 流量以确定哪一端(或中间路由器)正在终止 HTTP/TCP-IP 对话。一旦我们了解了哪一端终止了连接,就要看看原因。足够博学的人可以运行窥探
3) Is there a max number of requests configured/restricted on the server - and is that throttling your connections?
3)服务器上是否有最大数量的请求配置/限制 - 这是否限制了您的连接?
4) Are there any intermediate load balancers at which requests could be dropped?
4) 是否有任何中间负载平衡器可以删除请求?
Update:One more thing we wanted to, but did not complete is to create a static route between client and server to reduce the number of hops in between and ensure no network related connection drops. See http://en.wikipedia.org/wiki/Static_routing
更新:我们还想做但没有完成的另一件事是在客户端和服务器之间创建一条静态路由,以减少两者之间的跃点数并确保没有与网络相关的连接丢失。请参阅http://en.wikipedia.org/wiki/Static_routing
5) Another suggestion is setting the ConnectTimeouttoo to see if these work with a higher value. Update:You might want to try conn.getErrorStream()
5) 另一个建议是也设置ConnectTimeout以查看它们是否具有更高的值。 更新:您可能想尝试conn.getErrorStream()
Returns the error stream if the connection failed but the server sent useful data nonetheless. If the connection was not connected, or if the server did not have an error while connecting or if the server had an error but no error data was sent, this method will return null.
如果连接失败但服务器仍然发送了有用的数据,则返回错误流。如果连接未连接,或者服务器在连接时没有错误,或者如果服务器有错误但没有发送错误数据,则此方法将返回 null。
6) Could also try taking a set of thread dumps on the server 5 seconds apart, to see if any thread shows these incoming requests on the server.
6) 还可以尝试在服务器上间隔 5 秒进行一组线程转储,以查看是否有任何线程在服务器上显示这些传入请求。
Update:As of today we learnt to live with this problem, because we totalled the failure rate to be 200-300 out of 400,000 requests per day which is 0.00075 %
更新:截至今天,我们学会了忍受这个问题,因为我们每天 400,000 个请求中的总失败率为 200-300,即 0.00075 %
回答by Lonzak
We also experience sporadic timeouts when using it on our servers. We are able to fix it with two things:
在我们的服务器上使用它时,我们也会遇到偶发超时。我们可以通过两件事来修复它:
- Use specific ContentLength via
setFixedLengthStreamingMode
(brought down the error rate from ~150 to 10) - Retry if a timeout occurs (Error rate from 10 to 0. After max. one retry everything went through)
- 使用特定的 ContentLength via
setFixedLengthStreamingMode
(将错误率从 ~150 降低到 10) - 如果发生超时,请重试(错误率从 10 到 0。最多一次重试后,一切都通过了)
pseudo code:
伪代码:
//set timeouts to 6s
try{
//open connection here and write etc.
//use a timeout of 6s (since retry is in place)
}
catch (java.io.InterruptedIOException e) {
//read- or connection time out try again
}
Another theory why this is happening could be the following:
发生这种情况的另一个理论可能如下:
In the documentation of the HttpURLConnection/HttpsURLConnection one can read the following:
在 HttpURLConnection/HttpsURLConnection 的文档中,可以阅读以下内容:
Each HttpURLConnection instance is used to make a single request but the underlying network connection to the HTTP server may be transparently shared by other instances.
每个 HttpURLConnection 实例用于发出单个请求,但与 HTTP 服务器的底层网络连接可能由其他实例透明地共享。
So now calling close()
only would be ok but also calling disconnect()
would terminate the socket for the other users / transparently shared connections which would then run into a SocketTimeOut after the timeout period is reached.
所以现在close()
只调用就可以了,但调用disconnect()
也会终止其他用户的套接字/透明共享的连接,然后在达到超时期限后会遇到 SocketTimeOut。