如何防止在 Java 中挂起 SocketInputStream.socketRead0？

Question

提问by Piotr Müller

Performing millions of HTTP requests with different Java libraries gives me threads hanged on:

使用不同的 Java 库执行数百万个 HTTP 请求让我陷入困境：

java.net.SocketInputStream.socketRead0()

Which is nativefunction.

这是native功能。

I tried to set up Apche Http Client and RequestConfigto have timeouts on (I hope) everythig that is possible but still, I have (probably infinite) hangs on socketRead0. How to get rid of them?

我尝试设置 Apche Http Client 并RequestConfig在（我希望）所有可能的事情上设置超时，但我仍然（可能是无限）挂起socketRead0. 如何摆脱它们？

Hung ratio is about ~1 per 10000 requests (to 10000 different hosts) and it can last probably forever (I've confirmed thread hung as still valid after 10 hours).

挂起比率约为每 10000 个请求（针对 10000 个不同主机）约 1 次，并且它可能会持续永远（我已确认挂起线程在 10 小时后仍然有效）。

JDK 1.8 on Windows 7.

Windows 7 上的 JDK 1.8。

My HttpClientfactory:

我的HttpClient工厂：

SocketConfig socketConfig = SocketConfig.custom()
            .setSoKeepAlive(false)
            .setSoLinger(1)
            .setSoReuseAddress(true)
            .setSoTimeout(5000)
            .setTcpNoDelay(true).build();

    HttpClientBuilder builder = HttpClientBuilder.create();
    builder.disableAutomaticRetries();
    builder.disableContentCompression();
    builder.disableCookieManagement();
    builder.disableRedirectHandling();
    builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
    builder.setDefaultSocketConfig(socketConfig);

    return HttpClientBuilder.create().build();

My RequestConfigfactory:

我的RequestConfig工厂：

    HttpGet request = new HttpGet(url);

    RequestConfig config = RequestConfig.custom()
            .setCircularRedirectsAllowed(false)
            .setConnectionRequestTimeout(8000)
            .setConnectTimeout(4000)
            .setMaxRedirects(1)
            .setRedirectsEnabled(true)
            .setSocketTimeout(5000)
            .setStaleConnectionCheckEnabled(true).build();
    request.setConfig(config);

    return new HttpGet(url);

OpenJDK socketRead0source

OpenJDKsocketRead0源码

Note: Actually I have some "trick" - I can schedule .getConnectionManager().shutdown()in other Threadwith cancellation of Futureif request finished properly, but it is depracated and also it kills whole HttpClient, not only that single request.

注意：实际上我有一些“技巧” - 我可以安排.getConnectionManager().shutdown()其他Thread取消Future如果请求正确完成，但它已被弃用，并且它会杀死整个HttpClient，而不仅仅是单个请求。

Answer 1

采纳答案by Piotr Müller

For Apache HTTP Client (blocking) I found best solution is to getConnectionManager(). and shutdown it.

对于 Apache HTTP 客户端（阻塞），我发现最好的解决方案是 getConnectionManager()。并关闭它。

So in high-reliability solution I just schedule shutdown in other thread and in case request does not complete I'm shutting in down from other thread

因此，在高可靠性解决方案中，我只是在其他线程中安排关闭，如果请求未完成，我将从其他线程关闭

Answer 2

回答by ok2c

Given no one else responded so far, here is my take

鉴于到目前为止没有其他人回应，这是我的看法

Your timeout setting looks perfectly OK to me. The reason why certain requests appear to be constantly blocked in a java.net.SocketInputStream#socketRead0()call is likely to be due to a combination of misbehaving servers and your local configuration. Socket timeout defines a maximum period of inactivity between two consecutive i/o read operations (or in other words two consecutive incoming packets). Your socket timeout setting is 5,000 milliseconds. As long as the opposite endpoint keeps on sending a packet every 4,999 milliseconds for a chunk encoded message the request will never time out and will end up sending most of its time blocked in java.net.SocketInputStream#socketRead0(). You can find out whether or not this is the case by running HttpClient with wire logging turned on.

你的超时设置在我看来完全没问题。某些请求似乎在java.net.SocketInputStream#socketRead0()呼叫中不断被阻止的原因可能是由于服务器行为不当和您的本地配置的组合。套接字超时定义了两个连续的 I/O 读取操作（或换句话说，两个连续的传入数据包）之间的最长不活动时间。您的套接字超时设置为 5,000 毫秒。只要对方端点每 4,999 毫秒为块编码消息继续发送一个数据包，请求就永远不会超时，并且最终会在java.net.SocketInputStream#socketRead0(). 您可以通过在打开有线日志记录的情况下运行 HttpClient 来确定是否属于这种情况。

Answer 3

回答by Clint

You should consider a Non-blocking HTTP client like Grizzlyor Nettywhich do not have blocking operations to hang a thread.

您应该考虑像Grizzly或Netty这样的非阻塞 HTTP 客户端，它们没有挂起线程的阻塞操作。

Answer 4

回答by vzamanillo

As Clint said, you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request executionto prevent possible hangs of the main application thread (this not solve the problem but is better than restart your app because is freezed). Anyway, you set the setStaleConnectionCheckEnabledproperty but the stale connection check is not 100% reliable, from the Apache Httpclient tutorial:

正如Clint 所说，您应该考虑使用非阻塞 HTTP 客户端，或者（看到您正在使用 Apache Httpclient）实现多线程请求执行以防止主应用程序线程可能挂起（这不能解决问题，但比重新启动要好您的应用程序因为被冻结）。无论如何，您设置了setStaleConnectionCheckEnabled属性，但过时的连接检查不是 100% 可靠的，来自 Apache Httpclient 教程：

One of the major shortcomings of the classic blocking I/O model is that the network socket can react to I/O events only when blocked in an I/O operation. When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).
HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable and adds 10 to 30 ms overhead to each request execution.

经典阻塞 I/O 模型的主要缺点之一是网络套接字只有在 I/O 操作中被阻塞时才能对 I/O 事件做出反应。当一个连接被释放回管理器时，它可以保持活动状态，但是它无法监视套接字的状态并对任何 I/O 事件做出反应。如果连接在服务器端关闭，客户端连接将无法检测到连接状态的变化（并通过关闭其末端的套接字做出适当的反应）。
在使用连接执行 HTTP 请求之前，HttpClient 尝试通过测试连接是否“过时”来缓解该问题，该连接不再有效，因为它已在服务器端关闭。陈旧的连接检查不是 100% 可靠的，并且会为每个请求执行增加 10 到 30 毫秒的开销。

The Apache HttpComponents crew recommends the implementation of a Connection eviction policy

Apache HttpComponents 团队建议实施连接驱逐策略

The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections() method to close all expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.

对于空闲连接，每个套接字模型不涉及一个线程的唯一可行解决方案是专用监视器线程，用于驱逐由于长时间不活动而被认为已过期的连接。监控线程可以定期调用 ClientConnectionManager#closeExpiredConnections() 方法关闭所有过期的连接并从池中驱逐关闭的连接。它还可以选择调用 ClientConnectionManager#closeIdleConnections() 方法来关闭在给定时间段内空闲的所有连接。

Take a look at the sample code of the Connection eviction policysection and try to implement it in your application along with the Multithread request execution, I think the implementation of both mechanisms will prevent your undesired hangs.

查看连接驱逐策略部分的示例代码，并尝试在您的应用程序中与多线程请求执行一起实现它，我认为这两种机制的实现将防止您不希望的挂起。

Answer 5

回答by Trevor Robinson

Though this question mentions Windows, I have the same problem on Linux. It appears there is a flaw in the way the JVM implements blocking socket timeouts:

虽然这个问题提到了 Windows，但我在 Linux 上也有同样的问题。JVM 实现阻塞套接字超时的方式似乎存在缺陷：

To summarize, timeout for blocking sockets is implemented by calling pollon Linux (and selecton Windows) to determine that data is available before calling recv. However, at least on Linux, both methods can spuriously indicate that data is available when it is not, leading to recvblocking indefinitely.

总而言之，阻塞套接字的超时是通过poll在 Linux（和selectWindows）上调用来实现的，以确定在调用recv. 但是，至少在 Linux 上，这两种方法都可以在数据不可用时虚假指示数据可用，从而导致recv无限期阻塞。

From poll(2) man page BUGS section:

从 poll(2) 手册页 BUGS 部分：

See the discussion of spurious readiness notifications under the BUGS section of select(2).

请参阅 select(2) 的 BUGS 部分下对虚假就绪通知的讨论。

From select(2) man page BUGS section:

从 select(2) 手册页 BUGS 部分：

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

在 Linux 下， select() 可能会将套接字文件描述符报告为“ready for reading”，但随后的读取会阻塞。例如，当数据已到达但检查时校验和错误并被丢弃时，可能会发生这种情况。可能还有其他情况，其中文件描述符被虚假地报告为就绪。因此，在不应阻塞的套接字上使用 O_NONBLOCK 可能更安全。

The Apache HTTP Client code is a bit hard to follow, but it appearsthat connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policyapproach won't work in your case and can't be relied upon in general.

Apache HTTP 客户端代码有点难以理解，但似乎连接过期仅针对 HTTP 保持活动连接（您已禁用）设置并且是无限期的，除非服务器另有指定。因此，正如 oleg 所指出的，连接驱逐策略方法在您的情况下不起作用，并且通常不能依赖。

Answer 6

回答by Stefan Matei

I have more than 50 machines that make about 200k requests/day/machine. They are running Amazon Linux AMI 2017.03. I previously had jdk1.8.0_102, now I have jdk1.8.0_131. I am using both apacheHttpClient and OKHttp as scraping libraries.

我有 50 多台机器，每天/机器发出大约 20 万个请求。他们正在运行 Amazon Linux AMI 2017.03。我以前有 jdk1.8.0_102，现在我有 jdk1.8.0_131。我同时使用 apacheHttpClient 和 OKHttp 作为抓取库。

Each machine was running 50 threads, and sometimes, the threads get lost. After profiling with Youkit java profiler I got

每台机器运行 50 个线程，有时，线程会丢失。使用 Youkit java profiler 进行分析后，我得到了

ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio.read(Buffer, long) Okio.java:139
okio.AsyncTimeout.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83

I found out that they have a fix for this

我发现他们有一个解决方案

https://bugs.openjdk.java.net/browse/JDK-8172578

in JDK 8u152 (early access). I have installed it on one of our machines. Now I am waiting to see some good results.

在 JDK 8u152（早期访问）中。我已经将它安装在我们的一台机器上。现在我正在等待看到一些好的结果。

Answer 7

回答by Sergei Voitovich

I bumped into the same issue using apache common http client.

我使用 apache 通用 http 客户端遇到了同样的问题。

There's a pretty simple workaround (which doesn't require shutting the connection manager down):

有一个非常简单的解决方法（不需要关闭连接管理器）：

In order to reproduce it, one needs to execute the request from the question in a new thread paying attention to details:

为了重现它，需要在关注细节的新线程中执行来自问题的请求：

run request in separate thread, close request and release it's connection in a different thread, interrupt hanging thread
don't run EntityUtils.consumeQuietly(response.getEntity())in finally block (because it hangs on 'dead' connection)

在单独的线程中运行请求，关闭请求并在不同的线程中释放它的连接，中断挂起的线程
不要EntityUtils.consumeQuietly(response.getEntity())在 finally 块中运行（因为它挂在“死”连接上）

First, add the interface

一、添加接口

interface RequestDisposer {
    void dispose();
}

Execute an HTTP request in a new thread

在新线程中执行 HTTP 请求

final AtomicReference<RequestDisposer> requestDisposer = new AtomicReference<>(null);  

final Thread thread = new Thread(() -> {
    final HttpGet request = new HttpGet("http://my.url");
    final RequestDisposer disposer = () -> {
        request.abort();
        request.releaseConnection();
    };
    requestDiposer.set(disposer);

    try (final CloseableHttpResponse response = httpClient.execute(request))) {
        ...
    } finally {
      disposer.dispose();
    } 
};)
thread.start()

Call dispose()in the main thread to close hanging connection

调用dispose()主线程关闭挂起连接

requestDisposer.get().dispose(); // better check if it's not null first
thread.interrupt();
thread.join();

That fixed the issue for me.

那为我解决了这个问题。

My stacktrace looked like this:

我的堆栈跟踪如下所示：

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)

To whom it might be interesting, it easily reproducable, interrupt the thread without aborting request and releasing connection (ratio is about 1/100). Windows 10, version 10.0. jdk8.151-x64.

它可能对谁感兴趣，它很容易重现，在不中止请求和释放连接的情况下中断线程（比率约为 1/100）。Windows 10 版本 10.0。jdk8.151-x64。

Answer 8

回答by Gunther Schadow

I feel that all these answers are way too specific.

我觉得所有这些答案都太具体了。

We have to note that this is probably a real JVM bug. It should be possible to get the file descriptor and close it. All this timeout-talk is too high level. You do not want a timeout to the extent that the connection fails, what you want is an ability to hard break this stuck thread and stop or interrupt it.

我们必须注意，这可能是一个真正的 JVM 错误。应该可以获取文件描述符并关闭它。所有这些超时谈话的水平都太高了。您不希望连接失败的超时，您想要的是硬打破这个卡住的线程并停止或中断它的能力。

The way the JVM should implemented the SocketInputStream.socketRead function is to set some internal default timeout, which should be even as low as 1 second. Then when the timeout comes, immediately looping back to the socketRead0. While that is happening, the Thread.interrupt and Thread.stop commands can take effect.

JVM 应该实现 SocketInputStream.socketRead 函数的方式是设置一些内部默认超时，它甚至应该低至 1 秒。然后当超时到来时，立即循环回到socketRead0。发生这种情况时，Thread.interrupt 和 Thread.stop 命令可以生效。

The even better way of doing this of course is not to do any blocking wait at all, but instead use a the select(2) system call with a list of file descriptors and when any one has data available, let it perform the read operation.

更好的方法当然是不做任何阻塞等待，而是使用 select(2) 系统调用和文件描述符列表，当任何人有可用数据时，让它执行读取操作.

Just look all over the internet all these people having trouble with threads stuck in java.net.SocketInputStream#socketRead0, it's the most popular topic about java.net.SocketInputStream hands down!

看看互联网上所有这些人在 java.net.SocketInputStream#socketRead0 中遇到线程问题的人，这是关于 java.net.SocketInputStream 的最流行的话题！

So, while the bug is not fixed, I wonder about the most dirty trick I can come up with to break up this situation. Something like connecting with the debugger interface to get to the stack frame of the socketRead call and grab the FileDescriptor and then break into that to get the int fd number and then make a native close(2) call on that fd.

所以，虽然错误没有修复，但我想知道我能想出的最肮脏的技巧来打破这种情况。类似于连接调试器接口以获取 socketRead 调用的堆栈帧并获取 FileDescriptor，然后进入该文件以获取 int fd 编号，然后对该 fd 进行本机 close(2) 调用。

Do we have a chance to do that? (Don't tell me "it's not good practice") -- if so, let's do it!

我们有机会这样做吗？（不要告诉我“这不是一个好习惯”）——如果是这样，那就去做吧！

如何防止在 Java 中挂起 SocketInputStream.socketRead0？

提问by Piotr Müller

采纳答案by Piotr Müller

回答by ok2c

回答by Clint

回答by vzamanillo

回答by Trevor Robinson

回答by Stefan Matei

回答by Sergei Voitovich

回答by Gunther Schadow

相关推荐

最近更新

标签

如何防止在 Java 中挂起 SocketInputStream.socketRead0？

提问by Piotr Müller

采纳答案by Piotr Müller

回答by ok2c

回答by Clint

回答by vzamanillo

回答by Trevor Robinson

回答by Stefan Matei

回答by Sergei Voitovich

回答by Gunther Schadow

相关推荐

Java 不兼容的类型：void 不能转换为 int

我应该使用 java.util.Date 还是切换到 java.time.LocalDate

Java Guice 单例静态注入模式

Java 在Android中解析JSON数组和对象

相关推荐

最近更新

标签