Application control of TCP retransmission on Linux

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5907527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:58:02  来源:igfitidea点击:

Application control of TCP retransmission on Linux

linuxtcpnetwork-programming

提问by Petri Lehtinen

For the impatient:

For the impatient:

How to change the value of /proc/sys/net/ipv4/tcp_retries2for a single connection in Linux, using setsockopt(), ioctl()or such, or is it possible?

How to change the value of /proc/sys/net/ipv4/tcp_retries2for a single connection in Linux, using setsockopt(), ioctl()or such, or is it possible?

Longer decription:

Longer decription:

I'm developing an application that uses long polling HTTP requests. On the server side, it needs to be known when the client has closed the connection. The accuracy is not critical, but it certainly cannot be 15 minutes. Closer to a minute would do fine.

I'm developing an application that uses long polling HTTP requests. On the server side, it needs to be known when the client has closed the connection. The accuracy is not critical, but it certainly cannot be 15 minutes. Closer to a minute would do fine.

For those not familiar with the concept, a long polling HTTP request works like this:

For those not familiar with the concept, a long polling HTTP request works like this:

  • The client sends a request
  • The server responds with HTTP headers, but leaves the response open. Chunked transfer encoding is used, allowing the server to sends bits of data as they become available.
  • When all the data is sent, the server sends a "closing chunk" to signal that the response is complete.
  • The client sends a request
  • The server responds with HTTP headers, but leaves the response open. Chunked transfer encoding is used, allowing the server to sends bits of data as they become available.
  • When all the data is sent, the server sends a "closing chunk" to signal that the response is complete.

In my application, the server sends "heartbeats" to the client every now an then (30 seconds by default). A heartbeat is just a newline character that is sent as a response chunk. This is meant to keep the line busy so that we notify the connection loss.

In my application, the server sends "heartbeats" to the client every now an then (30 seconds by default). A heartbeat is just a newline character that is sent as a response chunk. This is meant to keep the line busy so that we notify the connection loss.

There's no problem when the client shuts down correctly. But when it's shut down with force (the client machine loses power, for example), a TCP reset is not sent. In this case, the server sends a heartbeat, which the client doesn't ACK. After this, the server keeps retransmitting the packet for roughly 15 minutes after giving up and reporting the failure to the application layer (our HTTP server). And 15 minutes is too long a wait in my case.

There's no problem when the client shuts down correctly. But when it's shut down with force (the client machine loses power, for example), a TCP reset is not sent. In this case, the server sends a heartbeat, which the client doesn't ACK. After this, the server keeps retransmitting the packet for roughly 15 minutes after giving up and reporting the failure to the application layer (our HTTP server). And 15 minutes is too long a wait in my case.

I can control the retransmission time by writing to the following files in /proc/sys/net/ipv4/:

I can control the retransmission time by writing to the following files in /proc/sys/net/ipv4/:

tcp_retries1 - INTEGER
    This value influences the time, after which TCP decides, that
    something is wrong due to unacknowledged RTO retransmissions,
    and reports this suspicion to the network layer.
    See tcp_retries2 for more details.

    RFC 1122 recommends at least 3 retransmissions, which is the
    default.

tcp_retries2 - INTEGER
    This value influences the timeout of an alive TCP connection,
    when RTO retransmissions remain unacknowledged.
    Given a value of N, a hypothetical TCP connection following
    exponential backoff with an initial RTO of TCP_RTO_MIN would
    retransmit N times before killing the connection at the (N+1)th RTO.

    The default value of 15 yields a hypothetical timeout of 924.6
    seconds and is a lower bound for the effective timeout.
    TCP will effectively time out at the first RTO which exceeds the
    hypothetical timeout.

    RFC 1122 recommends at least 100 seconds for the timeout,
    which corresponds to a value of at least 8.

The default value of tcp_retries2is indeed 8, and my experience of 15 minutes (900 seconds) of retransmission is in line with the kernel documentation quoted above.

The default value of tcp_retries2is indeed 8, and my experience of 15 minutes (900 seconds) of retransmission is in line with the kernel documentation quoted above.

If I change the value of tcp_retries2to 5 for example, the connection dies much more quicker. But setting it like this affects all the connections in the system, and I'd really like to set it for this one long polling connection only.

If I change the value of tcp_retries2to 5 for example, the connection dies much more quicker. But setting it like this affects all the connections in the system, and I'd really like to set it for this one long polling connection only.

A quote from RFC 1122:

A quote from RFC 1122:

4.2.3.5  TCP Connection Failures

   Excessive retransmission of the same segment by TCP
   indicates some failure of the remote host or the Internet
   path.  This failure may be of short or long duration.  The
   following procedure MUST be used to handle excessive
   retransmissions of data segments [IP:11]:

   (a)  There are two thresholds R1 and R2 measuring the amount
        of retransmission that has occurred for the same
        segment.  R1 and R2 might be measured in time units or
        as a count of retransmissions.

   (b)  When the number of transmissions of the same segment
        reaches or exceeds threshold R1, pass negative advice
        (see Section 3.3.1.4) to the IP layer, to trigger
        dead-gateway diagnosis.

   (c)  When the number of transmissions of the same segment
        reaches a threshold R2 greater than R1, close the
        connection.

   (d)  An application MUST be able to set the value for R2 for
        a particular connection.  For example, an interactive
        application might set R2 to "infinity," giving the user
        control over when to disconnect.

   (e)  TCP SHOULD inform the application of the delivery
        problem (unless such information has been disabled by
        the application; see Section 4.2.4.1), when R1 is
        reached and before R2.  This will allow a remote login
        (User Telnet) application program to inform the user,
        for example.

It seems to me that tcp_retries1and tcp_retries2in Linux correspond to R1and R2in the RFC. The RFC clearly states (in item d) that a conforming implementation MUST allow setting the value of R2, but I have found no way to do it using setsockopt(), ioctl()or such.

It seems to me that tcp_retries1and tcp_retries2in Linux correspond to R1and R2in the RFC. The RFC clearly states (in item d) that a conforming implementation MUST allow setting the value of R2, but I have found no way to do it using setsockopt(), ioctl()or such.

Another option would be to get a notification when R1is exceeded (item e). This is not as good as setting R2, though, as I think R1is hit pretty soon (in a few seconds), and the value of R1cannot be set per connection, or at least the RFC doesn't require it.

Another option would be to get a notification when R1is exceeded (item e). This is not as good as setting R2, though, as I think R1is hit pretty soon (in a few seconds), and the value of R1cannot be set per connection, or at least the RFC doesn't require it.

采纳答案by Kimvais

Looks like this was added in Kernel 2.6.37. Commit difffrom kernel Git and Excerpt from change logbelow;

Looks like this was added in Kernel 2.6.37. Commit difffrom kernel Git and Excerpt from change logbelow;

commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 Author: Jerry Chu Date: Fri Aug 27 19:13:28 2010 +0000

tcp: Add TCP_USER_TIMEOUT socket option.

This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If 
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 Author: Jerry Chu Date: Fri Aug 27 19:13:28 2010 +0000

tcp: Add TCP_USER_TIMEOUT socket option.

This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If 
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

回答by Simone

After some thinking (and googling), I came to the conclusion that you can't change tcp_retries1and tcp_retries2value for a single socket unless you apply some sort of patch to the kernel. Is that feasible for you?

After some thinking (and googling), I came to the conclusion that you can't change tcp_retries1and tcp_retries2value for a single socket unless you apply some sort of patch to the kernel. Is that feasible for you?

Otherways, you could use TCP_KEEPALIVEsocket option whose purpose is to check if a connection is still active (it seems to me that that's exactly what you are trying to achieve, so it has sense). Pay attention to the fact that you need to tweak its default parameter a little, because the default is to disconnect after about 2 hrs!!!

Otherways, you could use TCP_KEEPALIVEsocket option whose purpose is to check if a connection is still active (it seems to me that that's exactly what you are trying to achieve, so it has sense). Pay attention to the fact that you need to tweak its default parameter a little, because the default is to disconnect after about 2 hrs!!!

回答by caf

I suggest that if the TCP_USER_TIMEOUTsocket option described by Kimvaisis available, you use that. On older kernels where that socket option is not present, you could repeatedly call the SIOCOUTQioctl()to determine the size of the socket send queue - if the send queue doesn't decrease over your timeout period, that indicates that no ACKs have been received and you can close the socket.

I suggest that if the TCP_USER_TIMEOUTsocket option described by Kimvaisis available, you use that. On older kernels where that socket option is not present, you could repeatedly call the SIOCOUTQioctl()to determine the size of the socket send queue - if the send queue doesn't decrease over your timeout period, that indicates that no ACKs have been received and you can close the socket.

回答by Rndp13

This is for my understanding. tcp_retries2 is the number of retransmission that is permitted by the system before droping the conection.So if we want to change the default value of tcp_retries2 using TCP_USER_TIMEOUT which specifies the maximum amount of time transmitted data may remain unacknowledged, we have to increase the value of TCP_USER_TIMEOUT right?

This is for my understanding. tcp_retries2 is the number of retransmission that is permitted by the system before droping the conection.So if we want to change the default value of tcp_retries2 using TCP_USER_TIMEOUT which specifies the maximum amount of time transmitted data may remain unacknowledged, we have to increase the value of TCP_USER_TIMEOUT right?

In that case the conction will wait for a longer time and will not retransmit the data packet. Please correct me, if something is wrong.

In that case the conction will wait for a longer time and will not retransmit the data packet. Please correct me, if something is wrong.

回答by Alessar

int name[] = {CTL_NET, NET_IPV4, NET_IPV4_TCP_RETRIES2};
long value = 0;
size_t size = sizeof(value);
if(!sysctl(name, sizeof(name)/sizeof(name[0]), &value, &size, NULL, 0) {
  value // It contains current value from /proc/sys/net/ipv4/tcp_retries2
}
value = ... // Change value if it needed
if(!sysctl(name, sizeof(name)/sizeof(name[0]), NULL, NULL, &value, size) {
  // Value in /proc/sys/net/ipv4/tcp_retries2 changed successfully
}

Programmatically way using C. It works at least on Ubuntu. But (according with code and system variables) looks like it influences on all TCP connections in system, not only one single connection.

Programmatically way using C. It works at least on Ubuntu. But (according with code and system variables) looks like it influences on all TCP connections in system, not only one single connection.