由于 Mac OS X 上的“java.net.SocketException 无效参数”,Tomcat 启动失败

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16191236/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 22:10:22  来源:igfitidea点击:

Tomcat startup fails due to 'java.net.SocketException Invalid argument' on Mac OS X

tomcatjava-native-interfacejava

提问by Danny Thomas

We have an application that runs on Tomcat 6 (6.0.35.0 to be precise), and most of our engineers on Mac OS are having problems starting Tomcat due to the socketAccept call in the Catalina.await method throwing a SocketException:

我们有一个在 Tomcat 6(准确地说是 6.0.35.0)上运行的应用程序,由于 Catalina.await 方法中的 socketAccept 调用抛出了 SocketException,我们在 Mac OS 上的大多数工程师在启动 Tomcat 时都遇到了问题:

SEVERE: StandardServer.await: accept:
java.net.SocketException: Invalid argument
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java)
      at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
      at java.net.ServerSocket.implAccept(ServerSocket.java:522)
      at java.net.ServerSocket.accept(ServerSocket.java:490)
      at org.apache.catalina.core.StandardServer.await(StandardServer.java:431)
      at org.apache.catalina.startup.Catalina.await(Catalina.java:676)
      at org.apache.catalina.startup.Catalina.start(Catalina.java:628)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
      at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
      at mycompany.tomcat.startup.ThreadDumpWrapper.main(ThreadDumpWrapper.java:260)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.tanukisoftware.wrapper.WrapperStartStopApp.run(WrapperStartStopApp.java:238)
      at java.lang.Thread.run(Thread.java:722)

This causes Tomcat to shut down immediately after startup (and no small amount of rage). We think this has been with us for the duration on Mac OS w/ Java 1.7, in the last several months a lot of us have switched to Macbook Pros. Up until now, the only symptom was occasional zero byte responses from Tomcat, due to this exception also being thrown on a socketRead. Errors don't hit the logs and we'd individually shrugged it off as an isolated problem, and only found the cause when the startup problem started and I set a SocketException breakpoint:

这导致Tomcat在启动后立即关闭(并且不小的愤怒)。我们认为这在 Mac OS w/Java 1.7 上一直伴随着我们,在过去的几个月里,我们很多人都转向了 Macbook Pro。到目前为止,唯一的症状是偶尔来自 Tomcat 的零字节响应,因为这个异常也会在 socketRead 上抛出。错误不会出现在日志中,我们将其作为一个孤立的问题单独耸了耸肩,并且仅在启动问题开始时才找到原因并且我设置了 SocketException 断点:

Daemon Thread [http-8080-1] (Suspended (breakpoint at line 47 in SocketException))  
  SocketException.<init>(String) line: 47 
  SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method] 
  SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available  
  SocketInputStream.read(byte[], int, int, int) line: 150 
  SocketInputStream.read(byte[], int, int) line: 121  
  InternalInputBuffer.fill() line: 735  
  InternalInputBuffer.parseRequestLine() line: 366  
  Http11Processor.process(Socket) line: 814 
  Http11Protocol$Http11ConnectionHandler.process(Socket) line: 602  
  JIoEndpoint$Worker.run() line: 489  
  Thread.run() line: 722  

For arguments:

对于参数:

arg0  FileDescriptor  (id=499)  
  fd  1097  
  useCount  AtomicInteger  (id=503) 
    value 2 
arg1  (id=502)
arg2  0 
arg3  8192  
arg4  20000 

The problem is time sensitive. Increasing startup time due to application changes (lots more Spring introspection/singleton overhead) seems to be the factor that causes this to affect Tomcat startup; the tipping point being about 160 seconds. We can mitigate the problem by disabling some of the non-mandatory contexts we don't need during development to reduce startup time, but I'd prefer to find the root cause.

问题是时间敏感的。由于应用程序更改而增加启动时间(更多 Spring 内省/单例开销)似乎是导致这影响 Tomcat 启动的因素;临界点大约是 160 秒。我们可以通过禁用一些我们在开发过程中不需要的非强制上下文来减少启动时间来缓解这个问题,但我更愿意找到根本原因。

Application configuration

应用配置

The specifics of the application are far too complex to go into too much detail, but I have a hunch that this might relate to an earlier bind, so I'll at least list the listening ports on my machine:

应用程序的细节太复杂,无法详细介绍,但我有一种预感,这可能与早期绑定有关,因此我至少会列出我机器上的侦听端口:

localhost:32000 - Java service wrapper port
*:10001         - RMI registry
*:2322          - Java debug
*:56566         - RMI
*:8180          - Tomcat HTTP connector
*:8543          - Tomcat HTTPS connector
*:2223          - Tomcat Internal HTTP connector (used for cross-server requests)
*:14131         - 'Locking' port to determine if an internal service is running
*:56571         - EhCache RMI
*:56573         - RMI
*:62616         - ActiveMQ broker
*:5001          - SOAPMonitorService
*:8109          - Tomcat shutdown port

Items ruled out

排除的项目

  • The most obvious solution: -Djava.net.preferIPv4Stack=true. I've always had that option configured
  • Any recent configuration change to our base application configuration, libraries, JVM options (there aren't any)
  • A JDK regression. I've tested JDK 1.7.0_09, 11, 15, 17 and 21 (the JDKs I've had installed on my machine for the duration)
  • Mac OS update. Mac OS 10.7.x and 10.8.0 through 1.8.3 are affected
  • File descriptor limits - increased from 5000to 10000
  • Disabling IPv6 completely on the main ethernet interface
  • Setting breakpoints, and removing the first contexts to be affected by the SocketException (they're outgoing HTTP calls to web services). No change
  • Configuring /etc/hostsso the machine hostname resolves to localhost, and configuring JVM options to prefer IPv4 and to notprefer IPv6 addresses (This answer: https://stackoverflow.com/a/16318860/364206)
  • 最明显的解决方案:-Djava.net.preferIPv4Stack=true. 我一直都配置了那个选项
  • 最近对我们的基本应用程序配置、库、JVM 选项的任何配置更改(没有任何更改)
  • JDK 回归。我已经测试了 JDK 1.7.0_09、11、15、17 和 21(在此期间我已经在我的机器上安装的 JDK)
  • Mac 操作系统更新。Mac OS 10.7.x 和 10.8.0 到 1.8.3 受到影响
  • 文件描述符限制 - 从5000增加到10000
  • 在主以太网接口上完全禁用 IPv6
  • 设置断点,并删除第一个受 SocketException 影响的上下文(它们是对 Web 服务的传出 HTTP 调用)。没变化
  • 配置/etc/hosts使机器主机名解析为 localhost,并将 JVM 选项配置为首选 IPv4而不首选 IPv6 地址(此答案:https: //stackoverflow.com/a/16318860/364206

For those interested in hosts configuration, it's the same as default. I can reproduce this on a Fusion VM w/ a clean install of 10.8:

对于那些对主机配置感兴趣的人,它与默认设置相同。我可以在带有 10.8 全新安装的 Fusion VM 上重现此内容:

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1   localhost
255.255.255.255 broadcasthost
::1             localhost
fe80::1%lo0 localhost

Java code investigation

Java代码调查

Due to the apparent time sensitive nature of the issue, setting breakpoints to troubleshoot the issue causes it to not occur. As requested in the comments, I also captured arg0for SocksSocketImpl(PlainSocketImpl).socketAccept(SocketImpl), nothing seems out of the ordinary.

由于问题的明显时间敏感性,设置断点来解决问题会导致它不会发生。根据评论中的要求,我也捕获arg0SocksSocketImpl(PlainSocketImpl).socketAccept(SocketImpl),似乎没有任何异常。

arg0  SocksSocketImpl  (id=460) 
  address InetAddress  (id=465) 
    canonicalHostName null  
    holder  InetAddress$InetAddressHolder  (id=475) 
      address 0 
      family  0 
      hostName  null  
  applicationSetProxy false 
  closePending  false 
  cmdIn null  
  cmdOut  null  
  cmdsock null  
  CONNECTION_NOT_RESET  0 
  CONNECTION_RESET  2 
  CONNECTION_RESET_PENDING  1 
  external_address  null  
  fd  FileDescriptor  (id=713)  
    fd  -1  
    useCount  AtomicInteger  (id=771) 
      value 0 
  fdLock  Object  (id=714)  
  fdUseCount  0 
  localport 0 
  port  0 
  resetLock Object  (id=716)  
  resetState  0 
  server  null  
  serverPort  1080  
  serverSocket  null  
  shut_rd false 
  shut_wr false 
  socket  Socket  (id=718)  
    bound false 
    closed  false 
    closeLock Object  (id=848)  
    connected false 
    created false 
    impl  null  
    oldImpl false 
    shutIn  false 
    shutOut false 
  socketInputStream null  
  stream  false 
  timeout 0 
  trafficClass  0 
  useV4 false 

I think all of the threads where the exceptions are thrown are victims of an earlier call, one that doesn't result in a SocketException so I haven't been able to catch it. Being able to start Tomcat by reducing startup times convinces me that the trigger is probably some scheduled task that performs a socket based operation, which then affects other socket operations.

我认为抛出异常的所有线程都是早期调用的受害者,该调用不会导致 SocketException,因此我无法捕获它。能够通过减少启动时间来启动 Tomcat 使我相信触发器可能是一些执行基于套接字的操作的计划任务,然后会影响其他套接字操作。

That doesn't explain how and why this could affect several threads, no matter what we're doing to cause this condition, mysterious SocketExceptions shouldn't bubble up from native code and cause these exceptions simultaneously on multiple threads - that is, two threads making outgoing web service calls, the Tomcat await call, and several TP processor threads repeatedly.

这并没有解释这如何以及为什么会影响多个线程,无论我们做什么来导致这种情况,神秘的 SocketExceptions 不应该从本机代码中冒泡并在多个线程上同时导致这些异常 - 即两个线程进行传出 Web 服务调用、Tomcat 等待调用和多个 TP 处理器线程。

JNI code investigation

JNI代码调查

Given the generic message, I assumed that an EINVALerror must be returned from one of the system calls in the socketAccept JNI code, so I traced the system calls leading up to the exception; there's no EINVALreturned from any system call. So, I went to the OpenJDK sources looking for conditions in the socketAccept code that would set and then throw an EINVAL, but I also couldn't find any code that sets errnoto EINVAL, or calls NET_ThrowByNameWithLastError, NET_ThrowCurrentor NET_ThrowNewin a way that would throw a SocketException with this default error message.

鉴于通用消息,我假设EINVAL必须从 socketAccept JNI 代码中的系统调用之一返回错误,因此我跟踪了导致异常的系统调用;没有EINVAL从任何系统调用返回。所以,我去 OpenJDK 源代码中寻找 socketAccept 代码中的条件,这些条件会设置然后抛出EINVAL,但我也找不到任何设置errnoEINVAL,或调用NET_ThrowByNameWithLastErrorNET_ThrowCurrentNET_ThrowNew以这种方式抛出 SocketException 的代码默认错误信息。

As far as the system calls, we don't seem to get as far as the accept system call:

就系统调用而言,我们似乎没有达到 accept 系统调用的程度:

 PID/THRD        RELATIVE   ELAPSD    CPU SYSCALL(args)    = return
 6606/0x2c750d:  221538243       5      0 sigprocmask(0x1, 0x0, 0x14D8BE100)    = 0x0 0
 6606/0x2c750d:  221538244       3      0 sigaltstack(0x0, 0x14D8BE0F0, 0x0)     = 0 0
 6606/0x2c750d:  221538836      14     10 socket(0x2, 0x1, 0x0)    = 1170 0
 6606/0x2c750d:  221538837       3      0 fcntl(0x492, 0x3, 0x4)     = 2 0
 6606/0x2c750d:  221538839       3      1 fcntl(0x492, 0x4, 0x6)     = 0 0
 6606/0x2c750d:  221538842       5      2 setsockopt(0x492, 0xFFFF, 0x4)     = 0 0
 6606/0x2c750d:  221538852       7      4 bind(0x492, 0x14D8BE5D8, 0x10)     = 0 0
 6606/0x2c750d:  221538857       5      2 listen(0x492, 0x1, 0x4)    = 0 0
 6606/0x2c750d:  221539625       6      2 psynch_cvsignal(0x7FEFBFE00868, 0x10000000200, 0x100)    = 257 0
 6606/0x2c750d:  221539633       4      1 write(0x2, "Apr 18, 2013 11:05:35 AM org.apache.catalina.core.StandardServer await\nSEVERE: StandardServer.await: accept: \njava.net.SocketException: Invalid argument\n\tat java.net.PlainSocketImpl.socketAccept(Native Method)\n\tat java.net.PlainSocketImpl.socketAcce", 0x644)    = 1604 0

So, I thinkthe problem occurs in the timeout handling code at the top of the accept loop in socketAccept, but I couldn't find any case where NET_Timeoutwould set errnoto EINVAL, and result in this SocketException being thrown. I'm referring to this code; I assume the jdk7u branch is for the most part what ships in the Oracle JDK:

因此,我认为问题出现在 中接受循环顶部的超时处理代码中socketAccept,但我找不到任何情况下NET_Timeout将设置errnoEINVAL,并导致抛出此 SocketException。我指的是这段代码;我假设 jdk7u 分支大部分是 Oracle JDK 中的:

Help!

帮助!

I can't find anyone in the outside world affected by this particular problem on Mac OS, but almost everyone here is affected. There must be some application configuration that contributes, but I've exhausted every avenue I can think of to find the root cause.

我在 Mac OS 上找不到任何受此特定问题影响的外部世界,但这里的几乎每个人都受到影响。一定有一些应用程序配置有所贡献,但我已经用尽了我能想到的所有途径来找到根本原因。

Pointers on troubleshooting or insight on a possible cause would be much appreciated.

将不胜感激有关故障排除或洞察可能原因的指针。

采纳答案by Old Pro

Have you tried turning on JNI debuggingwith -Xcheck:jni? Interestingly the Oracle documentationuses a PlainSocketImpl.socketAccepterror as an example of when to use this.

您是否尝试过打开JNI调试-Xcheck:jni?有趣的是,Oracle 文档使用PlainSocketImpl.socketAccept错误作为何时使用它的示例。

Note also that the implication of Bug 7131399is that the JNI uses poll()on most platforms but select()on Mac OS due to a problem with poll()on the Mac. So maybe select()is broken too. Digging in a bit further, select() will return EINVAL if "ndfs is greater than FD_SETSIZE and _DARWIN_UNLIMITED_SELECT is not defined." FD_SETSIZE is 1024 and it sounds like you have a ton of applications loading up, so perhaps it all filters down to waiting on more that 1024 FDs at one time.

另请注意,Bug 7131399的含义是 JNIpoll()在大多数平台select()上使用,但由于 Mac 上的问题而在 Mac OSpoll()上使用。所以也许select()也坏了。再深入一点,如果“ndfs 大于 FD_SETSIZE 并且 _DARWIN_UNLIMITED_SELECT 未定义”,则 select() 将返回 EINVAL。FD_SETSIZE 是 1024,听起来你有大量的应用程序正在加载,所以也许这一切都过滤为一次等待超过 1024 个 FD。

For extra credit, see if the related (supposedly fixed) Java bugis in fact fixed on your machine. The bug report has pointers to test cases.

额外的功劳,请查看相关(据称已修复)的 Java 错误是否实际上已在您的机器上修复。错误报告包含指向测试用例的指针。



Thanks to Old Pro's answer, I confirmed that the select()FD_SETSIZE limitation is the cause. I located an existing bug for this limitation:

感谢 Old Pro 的回答,我确认select()FD_SETSIZE 限制是原因。我找到了此限制的现有错误:

https://bugs.openjdk.java.net/browse/JDK-8021820

https://bugs.openjdk.java.net/browse/JDK-8021820

The problem can be reproduced with the following code:

可以使用以下代码重现该问题:

import java.io.*;
import java.net.*;

public class SelectTest {
  public static void main(String[] args) throws Exception {
    // Use 1024 file descriptors. There'll already be some in use, obviously, but this guarantees the problem will occur
    for(int i = 0; i < 1024; i++) {
      new FileInputStream("/dev/null");
    }
    ServerSocket socket = new ServerSocket(8080);
    socket.accept();
  }
}

Almost a year later, Java 7u60 has a fix this problem:

将近一年后,Java 7u60 修复了这个问题:

http://www.oracle.com/technetwork/java/javase/2col/7u60-bugfixes-2202029.html

http://www.oracle.com/technetwork/java/javase/2col/7u60-bugfixes-2202029.html

I also discovered the Tomcat's WebappClassLoader closes file handles after 90 seconds, which explains why setting break points prevented the issue from occurring.

我还发现 Tomcat 的 WebappClassLoader 在 90 秒后关闭文件句柄,这解释了为什么设置断点阻止了该问题的发生。

回答by Clement

I had exactly the same issue (with Tomcat7), and what seems to work for me is to tick the "Publish module contexts to separate XML files" option when I'm running tomcat inside Eclipse. Have you tried that already?

我遇到了完全相同的问题(使用 Tomcat7),当我在 Eclipse 中运行 tomcat 时,似乎对我有用的是勾选“发布模块上下文以分离 XML 文件”选项。你已经试过了吗?

回答by Petro Semeniuk

I've been battling with this problem in another context. Solution(s) combined from different sources look like next:

我一直在另一个上下文中与这个问题作斗争。从不同来源组合的解决方案如下所示:

  • Update /etc/hosts with next overrides:
    • ::1 EWD-MacBook-Pro.local
    • 127.0.0.1 EWD-MacBook-Pro.local localhost
  • 使用下一个覆盖更新 /etc/hosts:
    • ::1 EWD-MacBook-Pro.local
    • 127.0.0.1 EWD-MacBook-Pro.local 本地主机

(EWD-MacBook-Pro.local is my machine name)

(EWD-MacBook-Pro.local 是我的机器名)

and

  • Set system properties:
    • java.net.preferIPv4Stack => true
    • java.net.preferIPv6Addresses => false
  • 设置系统属性:
    • java.net.preferIPv4Stack => 真
    • java.net.preferIPv6Addresses => false

Good luck!

祝你好运!