Java Haproxy + netty:防止连接重置异常的方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21550337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 09:20:08  来源:igfitidea点击:

Haproxy + netty: Way to prevent exceptions on connection reset?

javatcpnettyniohaproxy

提问by Benjaminssp

We're using haproxy in front of a netty-3.6-run backend. We are handling a huge number of connections, some of which can be longstanding.

我们在 netty-3.6-run 后端前使用 haproxy。我们正在处理大量的连接,其中一些可能是长期存在的。

Now the problem is that when haproxy closes a connection for means of rebalancing, it does so by sending a tcp-RST. When the sun.nio.ch-class employed by netty sees this, it throws an IOException: "Connection reset by peer".

现在的问题是,当 haproxy 关闭连接以进行重新平衡时,它是通过发送 tcp-RST 来完成的。当 netty 使用的 sun.nio.ch-class 看到这一点时,它会抛出一个 IOException: "Connection reset by peer"。

Trace:

痕迹:

sun.nio.ch.FileDispatcherImpl.read0(Native Method):1 in ""
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39):1 in ""
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225):1 in ""
sun.nio.ch.IOUtil.read(IOUtil.java:193):1 in ""
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375):1 in ""
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312):1 in ""
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90):1 in ""
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178):1 in ""
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145):1 in ""
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615):1 in ""
java.lang.Thread.run(Thread.java:724):1 in ""

This causes the following problems per configuration:

这会导致每个配置出现以下问题:

option http-pretend-keepalive

选项 http-pretend-keepalive

This is what works best (as haproxy seems to close most connections with a FIN rather than RST), but still produces about 3 exceptions per server per second. Also, it effectively neuters loadbalancing, because some incoming connections are very longstanding whith very high throughput: with pretend-keepalive, they never get rebalanced to another server by haproxy.

这是最好的方法(因为 haproxy 似乎关闭了大多数与 FIN 而不是 RST 的连接),但仍然每台服务器每秒产生大约 3 个异常。此外,它有效地中和了负载平衡,因为一些传入连接非常长且吞吐量非常高:通过假装保持连接,它们永远不会通过 haproxy 重新平衡到另一台服务器。

option http-keep-alive

选项 http-keep-alive

Since our backend expects keep-alive connections to really be kept alive (and hence does not close them on its own), this setting amounts to every connection eventually netting one exception, which in turn crashes our servers. We tried adding prefer-last-server, but it doesn't help much.

由于我们的后端希望保持活动的连接真正保持活动状态(因此不会自行关闭它们),因此此设置相当于每个连接最终都会产生一个异常,从而导致我们的服务器崩溃。我们尝试添加prefer-last-server,但它没有多大帮助。

option http-server-close

选项 http-server-close

This should theoretically work for both proper loadbalancing and no exceptions. However, it seems that after our backend-servers respond, there is a race as to which side sends its RST first: haproxy or our registered ChannelFutureListener.CLOSE. In practice, we still get too many exceptions and our servers crash.

这在理论上应该适用于适当的负载平衡并且没有例外。然而,似乎在我们的后端服务器响应之后,对于哪一方首先发送其 RST 存在竞争:haproxy 或我们注册的 ChannelFutureListener.CLOSE。在实践中,我们仍然会遇到太多异常并且我们的服务器崩溃。

Interestingly, the exceptions generally get more, the more workers we supply our channels with. I guess it speeds up reading more than writing.

有趣的是,异常通常越多,我们为频道提供的工作人员就越多。我想它比阅读更能加快阅读速度。

Anyways, I've read up on the different channel- and socketoptions in netty as well as haproxy for a while now and didn't really find anything that sounded like a solution (or worked when I tried it).

无论如何,我已经阅读了 netty 和 haproxy 中不同的 channel- 和 socketoptions 一段时间了,但并没有真正找到任何听起来像解决方案的东西(或者在我尝试时起作用)。

采纳答案by Benjaminssp

The Tomcat Nio-handler just does:

Tomcat Nio-handler 只做:

} catch (java.net.SocketException e) {
    // SocketExceptions are normal
    Http11NioProtocol.log.debug
        (sm.getString
         ("http11protocol.proto.socketexception.debug"), e);

} catch (java.io.IOException e) {
    // IOExceptions are normal
    Http11NioProtocol.log.debug

        (sm.getString
         ("http11protocol.proto.ioexception.debug"), e);

}

So it seems like the initial throw by the internal sun-classes (sun.nio.ch.FileDispatcherImpl) really is inevitable unless you reimplement them yourself.

因此,内部 sun 类 (sun.nio.ch.FileDispatcherImpl) 的初始抛出似乎真的是不可避免的,除非您自己重新实现它们。

回答by user207421

'Connecton reset by peer' is usually caused by writing to a connection that had already been closed by the other end. That causes the peer to send an RST. But it almost certainly had alreadysent a FIN. I would re-examine your assumptions here. Very few applications deliberately send RSTs. What you are most probably encountering is an application protocol error. If that's unavoidable,so is the ECONNRESET.

'Connecton reset by peer' 通常是由写入已被另一端关闭的连接引起的。这会导致对等方发送 RST。但几乎可以肯定,它已经发送了 FIN。我会在这里重新检查你的假设。很少有应用程序会故意发送 RST。您最有可能遇到的是应用程序协议错误。如果这是不可避免的,那么 ECONNRESET 也是如此。

回答by Rajashekhar S Choukimath

Try with

试试

  • option http-tunnel
  • no option redispatch
  • 选项 http 隧道
  • 没有选项重新调度

not sure of the redispatch, but http-tunnelfixed the issue on our end.

不确定重新调度,但http-tunnel在我们这边解决了这个问题。

回答by KCD

As of haproxy 1.5 it now sends FIN(FIN,ACK) to the backend server whereas harpoxy 1.4 used to send a RST. That will probably help in this scenario.

从 haproxy 1.5 开始,它现在将FIN( FIN,ACK)发送到后端服务器,而 harpoxy 1.4 过去常常发送RST. 在这种情况下,这可能会有所帮助。

If it can find this documented I will add the link...

如果它可以找到此文档,我将添加链接...

回答by Bandi Kishore

Note : As per my understanding, You don't have to worry about Connection Reset Exception's, unless you've a Connection Pooling at your end with Keep-Alive Connections.

注意:根据我的理解,您不必担心连接重置异常,除非您在使用 Keep-Alive Connections 时有一个连接池。

I faced a similar issue with lots of Connection Reset (RST)(It used to be 5-20times in a window of 10seconds, based on load) while using HAProxy for our services.
This is how I fixed it.

在为我们的服务使用 HAProxy 时,我遇到了很多连接重置 (RST)(过去在 10 秒的窗口内发生 5-20 次,基于负载)的类似问题。
这就是我修复它的方式。

We had a system where connections are always kept-alive (keep-alive is always true at HTTP connection level. i.e., Once a connection is established, we reuse this connection from HTTP Connection poolfor subsequent calls instead of creating new ones.)

我们有一个连接始终保持活动状态的系统(在 HTTP 连接级别,保持活动状态始终为真。即,一旦建立了连接,我们就重用来自HTTP 连接池的此连接用于后续调用,而不是创建新连接。)

Now, As per my debugging in Codeand TCP DumpI found RST's were thrown from HAProxy in below scenario's

现在,根据我在代码TCP 转储中的调试,我发现 RST 是在以下场景中从 HAProxy 抛出的

  1. When HAProxy's timeout clientor timeout serverhad reached, on an Idle Connection.
    This configuration was set as 60seconds for us. Since we have a pool of connections, when the load on server decreases it would result in some of these connections not getting used for a minute.
    So these connection's were then closed by HAProxy using a RST Signal.

  2. When HAProxy's option prefer-last-serverwas not set.
    As per the Docs:

  1. 当 HAProxy 的超时客户端超时服务器到达时,在空闲连接上。
    此配置为我们设置为 60 秒。由于我们有一个连接池,当服务器上的负载减少时,将导致其中一些连接在一分钟内无法使用。
    因此,这些连接随后由 HAProxy 使用 RST 信号关闭。

  2. 当 HAProxy 的选项 prefer-last-server未设置时。
    根据文档:

The real use is for keep-alive connections sent to servers. When this option is used, haproxy will try to reuse the same connection that is attached to the server instead of rebalancing to another server, causing a close of the connection.

真正的用途是用于发送到服务器的保持活动连接。使用此选项时,haproxy 将尝试重用附加到服务器的相同连接,而不是重新平衡到另一台服务器,从而导致连接关闭。

Since this was not set, everytime a connection was re-used from the pool, HAProxy used to Close this connection using RST Signal and create a new one to a different server (As our Load Balancer was set to round-robin). This was messing up and rendering the entire Connection Pooling useless.

由于没有设置,每次从池中重新使用连接时,HAProxy 都会使用 RST 信号关闭此连接并创建一个到不同服务器的新连接(因为我们的负载均衡器设置为循环)。这搞砸了并使整个连接池变得毫无用处。

So the Configuration that worked Fine:

所以工作正常的配置:

  1. option prefer-last-server: So existing Connections to a server will be re-used.
    Note: This will NOT cause the Load balancer to use previous server over new server for a new connection. The decision making for new connections is always based on the load balancing algorithm. This option is only for an existing connection which was already alive between a client and a server.
    When I tested with this option, new connection was still going to a server2 even though the connection before this was sent to server1.
  2. balanceleastconn : With Round robinand Keep-Alive, there could be skewing of connections to a single server. (Say there are just 2 servers and when One server goes down due to deployment, then all new connections will start going to the other server. So even when server2 comes up, round-robin would still allocate new requests one to server1 and one to server2 alternatively. In spite of server1 having a lot of connections at its end. So the Server's load is never exactly Balanced.).
  3. Setting HAProxy's timeout clientor timeout serverto 10minutes. This increased amount of time our connections could stay idle.
  4. Implemented an IdleConnectionMonitor: With the timeout being set to 10m, the chances of RST from HAProxy was reduced but not eliminated.
    To remove it completely, we added a IdleConnectionMonitor which was responsible for closing connections which was idle for more than 9Minutes.
  1. option preferred-last-server:因此将重新使用与服务器的现有连接。
    注意:这不会导致负载平衡器在新连接上使用以前的服务器而不是新服务器。新连接的决策总是基于负载平衡算法。此选项仅适用于客户端和服务器之间已经存在的现有连接。
    当我使用此选项进行测试时,即使之前的连接已发送到 server1,新连接仍会转到 server2。
  2. balanceleastconn :使用Round robinKeep-Alive 时,可能会出现与单个服务器的连接偏差。(假设只有 2 台服务器,当一台服务器因部署而停机时,所有新连接都将开始连接到另一台服务器。因此,即使当 server2 出现时,循环仍会将新请求分配给 server1,一个分配给server2 替代。尽管 server1 在其末端有很多连接。因此服务器的负载永远不会完全平衡。)。
  3. 将 HAProxy 的超时客户端超时服务器设置为 10 分钟。这增加了我们的连接可以保持空闲的时间。
  4. 实现了IdleConnectionMonitor:将超时设置为 10m,来自 HAProxy 的 RST 机会减少但并未消除。
    为了完全删除它,我们添加了一个 IdleConnectionMonitor,它负责关闭空闲时间超过 9Minutes 的连接


With these configurations, we could


通过这些配置,我们可以

  • Eliminate the Connection Reset
  • Get Connection Pooling working
  • Ensured the load balancing happens evenly across Servers no matter what time they start.
  • 消除连接重置
  • 让连接池工作
  • 确保负载平衡在服务器之间均匀发生,无论它们何时启动。

Hope this helps!!

希望这可以帮助!!