Linux 为什么 connect() 会给 EADDRNOTAVAIL?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3886506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 23:37:31  来源:igfitidea点击:

Why would connect() give EADDRNOTAVAIL?

c++linuxsocketstcpip-address

提问by WilliamKF

I have in my application a failure that arose which does not seem to be reproducible. I have a TCP socket connection which failed and the application tried to reconnect it. In the second call to connect() attempting to reconnect, I got an error result with errno == EADDRNOTAVAIL which the man page for connect() says means: "The specified address is not available from the local machine."

我在我的应用程序中出现了一个似乎无法重现的失败。我有一个失败的 TCP 套接字连接,应用程序试图重新连接它。在对 connect() 尝试重新连接的第二次调用中,我得到了一个错误结果,错误结果为 errno == EADDRNOTAVAIL,connect() 的手册页表示:“指定的地址在本地机器上不可用。”

Looking at the call to connect(), the second argument appears to be the address to which the error is referring to, but as I understand it, this argument is the TCP socket address of the remote host, so I am confused about the man page referring to the local machine. Is it that this address to the remote TCP socket host is not available from my local machine? If so, why would this be? It had to have succeeded calling connect() the first time before the connection failed and it attempted to reconnect and got this error. The arguments to connect() were the same both times.

查看对connect()的调用,第二个参数似乎是错误所指的地址,但据我了解,这个参数是远程主机的TCP套接字地址,所以我对man感到困惑页面指​​的是本地机器。是不是我的本地机器上没有远程 TCP 套接字主机的这个地址?如果是这样,为什么会这样?它必须在连接失败之前第一次成功调用 connect() 并尝试重新连接并出现此错误。connect() 的参数两次都相同。

Would this error be a transient one which, if I had tried calling connect again might have gone away if I waited long enough? If not, how should I try to recover from this failure?

这个错误是否是暂时的,如果我等待足够长的时间,如果我再次尝试调用 connect 可能会消失?如果没有,我应该如何尝试从这次失败中恢复?

采纳答案by David

Check this link

检查此链接

http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html

http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html

EDIT: Yes I meant to add more but had to cut it there because of an emergency

编辑:是的,我想添加更多,但由于紧急情况不得不在那里削减它

Did you close the socket before attempting to reconnect? Closing will tell the system that the socketpair (ip/port) is now free.

在尝试重新连接之前是否关闭了套接字?关闭将告诉系统 socketpair (ip/port) 现在是空闲的。

Here are additional items too look at:

以下是其他项目也请查看:

  • If the local port is already connected to the given remote IP and port (i.e., there's already an identical socketpair), you'll receive this error (see bug link below).
  • Binding a socket address which isn't the local one will produce this error. if the IP addresses of a machine are 127.0.0.1 and 1.2.3.4, and you're trying to bind to 1.2.3.5 you are going to get this error.
  • EADDRNOTAVAIL: The specified address is unavailable on the remote machine or the address field of the name structure is all zeroes.
  • 如果本地端口已经连接到给定的远程 IP 和端口(即,已经存在相同的套接字对),您将收到此错误(请参阅下面的错误链接)。
  • 绑定不是本地地址的套接字地址将产生此错误。如果一台机器的 IP 地址是 127.0.0.1 和 1.2.3.4,而你试图绑定到 1.2.3.5,你就会得到这个错误。
  • EADDRNOTAVAIL:指定的地址在远程机器上不可用或名称结构的地址字段全为零。

Link with a bug similar to yours (answer is close to the bottom)

链接与您类似的错误(答案接近底部)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599

It seems that your socket is basically stuck in one of the TCP internal states and that adding a delay for reconnection might solve your problem as they seem to have done in that bug report.

似乎您的套接字基本上停留在 TCP 内部状态之一,并且添加重新连接延迟可能会解决您的问题,就像他们在该错误报告中所做的那样。

回答by mkerley

This can also happen if an invalid port is given, like 0.

如果提供了无效端口(例如 0),也会发生这种情况。

回答by ZakW

Another thing to check is that the interface is up. I got confused by this one recently while using network namespaces, since it seems creating a new network namespace produces an entirely independent loopback interface but doesn't bring it up (at least, with Debian wheezy's versions of things). This escaped me for a while since one doesn't typically think of loopback as ever being down.

要检查的另一件事是接口已启动。我最近在使用网络命名空间时对此感到困惑,因为它似乎创建一个新的网络命名空间会产生一个完全独立的环回接口,但并没有提出它(至少,对于 Debian wheezy 的版本)。这让我逃避了一段时间,因为人们通常不会像以往那样认为环回会出现故障。

回答by Edward Z. Yang

If you are unwilling to change the number of temporary ports available (as suggested by David), or you need more connections than the theoretical maximum, there are two other methods to reduce the number of ports in use. However, they are to various degrees violations of the TCP standard, so they should be used with care.

如果您不愿意更改可用的临时端口数(如 David 所建议的),或者您需要比理论最大值更多的连接,还有另外两种方法可以减少正在使用的端口数。但是,它们在不同程度上违反了 TCP 标准,因此应谨慎使用。

The first is to turn on SO_LINGERwith a zero-second timeout, forcing the TCPstack to send a RST packet and flush the connection state. There is one subtlety, however: you should call shutdownon the socket file descriptor before you close, so that you have a chance to send a FINpacket before the RSTpacket. So the code will look something like:

第一种是开启SO_LINGER零秒超时,强制TCP堆栈发送 RST 数据包并刷新连接状态。然而,有一个微妙之处:你应该shutdown在你之前调用套接字文件描述符close,这样你就有机会在FIN数据包之前发送一个数据RST包。所以代码看起来像:

shutdown(fd, SHUT_RDWR);
struct linger linger;
linger.l_onoff = 1;
linger.l_linger = 0;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_LINGER,
           (char *) &linger, sizeof(linger));
close(fd);

The server should only see a premature connection reset if the FINpacket gets reordered with the RSTpacket.

如果FIN数据包与RST数据包一起重新排序,服务器应该只看到过早的连接重置。

See TCP option SO_LINGER (zero) - when it's requiredfor more details. (Experimentally, it doesn't seem to matter where you set setsockopt.)

请参阅TCP 选项 SO_LINGER(零)- 需要时了解更多详细信息。(在实验中,你在哪里设置似乎并不重要setsockopt。)

The second is to use SO_REUSEADDRand an explicit bind(even if you're the client), which will allow Linux to reuse temporary ports when you run, before they are done waiting. Note that you mustuse bindwith INADDR_ANYand port 0, otherwise SO_REUSEADDRis not respected. Your code will look something like:

第二个是使用SO_REUSEADDR显式bind(即使您是客户端),这将允许 Linux 在您运行时重用临时端口,然后再等待它们。请注意,您必须使用bindwithINADDR_ANY和 port 0,否则SO_REUSEADDR不遵守。您的代码将类似于:

int opts = 1;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
         (char *) &opts, sizeof(int));

struct sockaddr_in listen_addr;
listen_addr.sin_family = AF_INET;
listen_addr.sin_port = 0;
listen_addr.sin_addr.s_addr = INADDR_ANY;
// todo: test for error
bind(fd, (struct sockaddr *) &listen_addr, sizeof(listen_addr));

// todo: test for addr
// saddr is the struct sockaddr_in you're connecting to
connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));

This option is less good because you'll still saturate the internal kernel data structures for TCP connections as per netstat -an | grep -e tcp -e udp | wc -l. However, you won't start reusing ports until this happens.

这个选项不太好,因为您仍然会根据netstat -an | grep -e tcp -e udp | wc -l. 但是,在发生这种情况之前,您不会开始重用端口。

回答by Surendra Mobiya

I got this issue. I got it resolve by enabling tcp timestamp.

我得到了这个问题。我通过启用 tcp 时间戳解决了这个问题。

Root cause:

根本原因:

  1. After connection close, Connections will go in TIME_WAIT state for some time.

  2. During this state if any new connections comes with same IP and PORT, if SO_REUSEADDR is not provided during socket creation then socket bind() will fail with error EADDRINUSE.

  3. But even though after providing SO_REUSEADDR also sockect connect() may fail with error EADDRNOTAVAIL if tcp timestamp is not enable on both side.

  1. 连接关闭后,连接将进入 TIME_WAIT 状态一段时间。

  2. 在此状态期间,如果任何新连接具有相同的 IP 和端口,如果在套接字创建期间未提供 SO_REUSEADDR,则套接字 bind() 将失败并出现错误 EADDRINUSE。

  3. 但是,即使在提供 SO_REUSEADDR 之后,如果双方都未启用 tcp 时间戳,那么 sockect connect() 也可能会失败并显示错误 EADDRNOTAVAIL。

Solution: Please enable tcp timestamp on both side client and server.

解决方案:请在客户端和服务器端都启用tcp时间戳。

echo 1 > /proc/sys/net/ipv4/tcp_timestamps

回声 1 > /proc/sys/net/ipv4/tcp_timestamps

Reason to enable tcp_timestamp:

启用 tcp_timestamp 的原因:

When we enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If we enable tcp_timestamps, it will make sure that those collisions cannot happen. However, we need TCP timestamps to be enabled on both ends. See the definition of tcp_twsk_unique for the gory details.

当我们启用 tcp_tw_reuse 时,处于 TIME_WAIT 状态的套接字可以在它们到期之前使用,内核将尝试确保没有关于 TCP 序列号的冲突。如果我们启用 tcp_timestamps,它将确保这些冲突不会发生。但是,我们需要在两端启用 TCP 时间戳。有关详细信息,请参阅 tcp_twsk_unique 的定义。

reference: https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1

参考:https: //serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1