Java 为什么当 Web 服务停止工作时,我会看到许多处于 CLOSE_WAIT 状态的套接字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28875406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 06:58:59  来源:igfitidea点击:

Why am i seeing lots of sockets in CLOSE_WAIT status when webservice stops working?

javatomcattcpjettyload-balancing

提问by Paul Taylor

My java webservice running on Jetty falls over after a period of a few hours and investigation indicate many sockets in CLOSE_WAIT status. Whilst it is working ok there seems to be no sockets in CLOSE_WAIT status but when it goes wrong there are loads.

我在 Jetty 上运行的 java web 服务在几个小时后出现故障,调查表明许多套接字处于 CLOSE_WAIT 状态。虽然它工作正常,但似乎没有处于 CLOSE_WAIT 状态的套接字,但是当它出错时有负载。

I found this definition

我找到了这个定义

CLOSE-WAIT: The local end-point has received a connection termination request and acknowledged it e.g. a passive close has been performed and the local end-point needs to perform an active close to leave this state.

CLOSE-WAIT:本地端点已收到连接终止请求并确认它,例如已执行被动关闭并且本地端点需要执行主动关闭以离开此状态。

With netstat on my server I see a list of tcp sockets in CLOSE_WAIT status, the local address is my server and the foreign address my load balancer machine. So I assume this means the client (load balancer) has just terminated the connection at its end in some improper way, and my server has not properly closed the connection at its end.

在我的服务器上使用 netstat 时,我看到一个 CLOSE_WAIT 状态的 tcp 套接字列表,本地地址是我的服务器,外部地址是我的负载均衡器机器。所以我认为这意味着客户端(负载平衡器)刚刚以某种不正确的方式终止了连接,而我的服务器在其末端没有正确关闭连接。

But how do I do that, my Java code doesn't deal with low level sockets ?

但是我该怎么做,我的 Java 代码不处理低级套接字?

Or is the load-balancer terminating connection because of an earlier problem caused by something my server is doing wrong in the code.

或者负载平衡器终止连接是因为我的服务器在代码中做错了什么导致了早期问题。

回答by TV Trailers

Is the load balancer still up? Try stopping the load balancer and see if this is the issue not the server.

负载均衡器还在吗?尝试停止负载平衡器,看看这是不是服务器的问题。

回答by Eirenliel

Sounds like a bug in Jetty or JVM, maybe this workaround will work for you: http://www.tux.hk/index.php?entry=entry090521-111844

听起来像是 Jetty 或 JVM 中的错误,也许这个解决方法对你有用:http://www.tux.hk/index.php?entry=entry090521-111844

Add the following lines to /etc/sysctl.conf

将以下行添加到 /etc/sysctl.conf

net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 2
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 1800

And then execute

然后执行

sysctl -p

or do a reboot

或重新启动

回答by Alex Fitzpatrick

This probably means you're not cleaning up your incoming connections. Make sure sockets are getting closed at the end of each transaction. (Best done in a finally blocknear the beginning of your server code so that connections get closed even if server side exceptions occur.)

这可能意味着您没有清理传入的连接。确保套接字在每个事务结束时关闭。(最好在服务器代码开头附近的finally 块中完成,这样即使发生服务器端异常,连接也会关闭。)

回答by esaj

I suspect this could be something causing a long or infinite loop/infinite wait in your server code, and Jetty simply never gets a chance to close the connection (unless there's some sort of timeout that forcibly closes the socket after a certain period). Consider the following example:

我怀疑这可能是导致您的服务器代码中出现长时间或无限循环/无限等待的原因,而 Jetty 根本没有机会关闭连接(除非某种超时会在一段时间后强行关闭套接字)。考虑以下示例:

public class TestSocketClosedWaitState
{
    private static class SocketResponder implements Runnable
    {
        private final Socket socket;

        //Using static variable to control the infinite/waiting loop for testing purposes, with while(true) Eclipse would complain of dead code in writer.close() -line
        private static boolean infinite = true;

        public SocketResponder(Socket socket)
        {
            this.socket = socket;
        }       

        @Override
        public void run()
        {
            try
            {               
                PrintWriter writer = new PrintWriter(socket.getOutputStream()); 
                writer.write("Hello");              

                //Simulating slow response/getting stuck in an infinite loop/waiting something that never happens etc.
                do
                {
                    Thread.sleep(5000);
                }
                while(infinite);

                writer.close(); //The socket will stay in CLOSE_WAIT from server side until this line is reached
            }
            catch(Exception e)
            {
                e.printStackTrace();
            }           

            System.out.println("DONE");
        }
    }

    public static void main(String[] args) throws IOException
    {
        ServerSocket serverSocket = new ServerSocket(12345);

        while(true)
        {
            Socket socket = serverSocket.accept();
            Thread t = new Thread(new SocketResponder(socket));
            t.start();
        }       
    }
}

With the infinite-variable set to true, the Printwriter (and underlying socket) never gets closed due to infinite loop. If I run this and connect to the socket with telnet, then quit the telnet-client, netstatwill show the server side-socket still in CLOSE_WAIT-state (I could also see the client-side socket in FIN_WAIT2-state for a while, but it'll disappear):

随着infinite-variable设置为true,则PrintWriter的(和基础套接字)永远不会被关闭,由于无限循环。如果我运行它并使用 telnet 连接到套接字,然后退出 telnet-client,netstat将显示服务器端套接字仍处于-CLOSE_WAIT状态(我也可以看到客户端套接字处于 FIN_WAIT2 状态一段时间,但它会消失):

~$ netstat -anp | grep 12345
tcp6       0      0 :::12345        :::*            LISTEN      6460/java       
tcp6       1      0 ::1:12345       ::1:34606       CLOSE_WAIT  6460/java   

The server-side accepted socket gets stuck in the CLOSE_WAIT -state. If I check the thread stacks for the process, I can see the thread waiting inside the do...while -loop:

服务器端接受的套接字卡在 CLOSE_WAIT 状态。如果我检查进程的线程堆栈,我可以看到在 do...while 循环中等待的线程:

~$ jstack 6460

<OTHER THREADS>

"Thread-0" prio=10 tid=0x00007f424013d800 nid=0x194f waiting on condition [0x00007f423c50e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at TestSocketClosedWaitState$SocketResponder.run(TestSocketClosedWaitState.java:32)
    at java.lang.Thread.run(Thread.java:701)

<OTHER THREADS...>

If I set the infinite-variable to false, and do the same (connect client & disconnect), the socket with CLOSE_WAIT-state will show until the writer is closed (closing the underlying socket), and then disappears. If the writer or socket is never closed, the server-side socket will again get stuck in CLOSED_WAIT, even if the thread terminates (I don't think this should occur in Jetty, if your method returns at some point, Jetty probably should take care of closing the socket).

如果我将infinite-variable设置为 false,并执行相同的操作(连接客户端并断开连接),则带有CLOSE_WAIT-state的套接字将显示,直到编写器关闭(关闭底层套接字),然后消失。如果写入器或套接字从未关闭,CLOSED_WAIT即使线程终止,服务器端套接字也将再次陷入 .关闭套接字)。

So, steps I'd suggest you to try and find the culprit are

所以,我建议你尝试找出罪魁祸首的步骤是

  • Add logging to your methods to see where there are going/what they are doing
  • Check your code, are there any places where the execution could get stuck in an infinite loop or take a really long while, preventing the underlying socket from being closed?
  • If it still occurs, take a thread dump from the running Jetty-process with jstackthe next time this problem occurs and try to identify any "stuck" threads
  • Is there a chance something might throw something (OutOfMemoryError or such) that might not get caught by the underlying Jetty-architecture calling your method? I've never peeked inside Jetty's internals, it could very well be catching Throwables, so this is probably not the issue, but maybe worth checking if all else fails
  • 将日志记录添加到您的方法中,以查看正在执行的操作/正在执行的操作
  • 检查您的代码,是否有任何地方执行可能陷入无限循环或花费很长时间,从而阻止底层套接字被关闭?
  • 如果它仍然发生,jstack在下次出现此问题时从正在运行的 Jetty 进程中获取线程转储并尝试识别任何“卡住”的线程
  • 是否有可能抛出一些可能不会被调用您的方法的底层 Jetty 架构捕获的东西(OutOfMemoryError 等)?我从来没有窥视过 Jetty 的内部结构,它很可能会捕获Throwables,所以这可能不是问题,但也许值得检查一下是否所有其他方法都失败了

You could also name the threads when they enter and exit your methods with something like

您还可以在线程进入和退出您的方法时使用类似的名称命名线程

        String originalName = Thread.currentThread().getName();
        Thread.currentThread().setName("myMethod");

        //Your code...

        Thread.currentThread().setName(originalName);

to spot them easier if there are a lot of threads running.

如果有很多线程在运行,可以更容易地发现它们。

回答by Vitalii Ivanov

We have the same problem in our project. I'm not sure that this is your case, but maybe it will be helpful.

我们的项目中也有同样的问题。我不确定这是不是你的情况,但也许它会有所帮助。

The reason was that a huge number of requests was handled by business logic with synchronized block. So when the client sent packets to drop connection, the thread bound to this socket was busy, waiting for monitor.

原因是大量的请求是由带有同步块的业务逻辑处理的。所以当客户端发送数据包丢弃连接时,绑定到这个套接字的线程很忙,等待监听。

The logs show exceptions for org.eclipse.jetty.io.WriteFlusher at write method:

日志在 write 方法中显示 org.eclipse.jetty.io.WriteFlusher 的异常:

DEBUG org.eclipse.jetty.io.WriteFlusher - write - write exception
org.eclipse.jetty.io.EofException: null
    at org.eclipse.jetty.io.ChannelEndPoint.flush
(ChannelEndPoint.java:192) ~[jetty-io-9.2.10.v20150310.jar:9.2.10.v20150310]

and for org.eclipse.jetty.server.HttpOutput at close method. I think that exception at close step is the reason of sockets' CLOSE_WAIT state:

和 org.eclipse.jetty.server.HttpOutput at close 方法。我认为关闭步骤的异常是套接字 CLOSE_WAIT 状态的原因:

DEBUG org.eclipse.jetty.server.HttpOutput - close -
org.eclipse.jetty.io.EofException: null
    at org.eclipse.jetty.server.HttpConnection$SendCallback.reset
(HttpConnection.java:622) ~[jetty-server-9.2.10.v20150310.jar:9.2.10.v20150310]

The fast solution in our case was to increase idleTimeout. The right solution (again in our case) is code refactoring.

在我们的案例中,快速的解决方案是增加 idleTimeout。正确的解决方案(同样在我们的例子中)是代码重构。

So my advice is to carefully read Jetty's logs with DEBUG level to find exceptions and analyze application performance with VisualVM. Maybe the reason is performance bottleneck (synchronized blocks?).

所以我的建议是仔细阅读 Jetty 的 DEBUG 级别的日志,以发现异常并使用 VisualVM 分析应用程序性能。也许原因是性能瓶颈(同步块?)。

回答by Abhishek Gupta

I faced a similar problem, while the culprit code may differ, the symptoms were 1) Server (Jetty) was running yet not processing request 2) There was not extra ordinary load/exceptions 3) Too many CLOSE_WAIT connections were there.

我遇到了类似的问题,虽然罪魁祸首代码可能不同,但症状是 1) 服务器 (Jetty) 正在运行但未处理请求 2) 没有额外的普通负载/异常 3) 有太多 CLOSE_WAIT 连接。

These suggested that all the worker threads in the server are stuck somewhere. Jstack Thread dump showed that all our worker threads were stuck in apache HttpClient object. (because of unclosed response objects), and since all the threads were waiting infinitely, none were available to process the incoming request.

这些表明服务器中的所有工作线程都卡在某个地方。Jstack 线程转储显示我们所有的工作线程都卡在 apache HttpClient 对象中。(由于未关闭的响应对象),并且由于所有线程都在无限等待,因此没有一个线程可用于处理传入的请求。