Java socketRead0 问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12544212/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 09:19:08  来源:igfitidea点击:

Java socketRead0 Issue

javasockets

提问by John

I'm developing a web cralwer with htmlunit and I have added all required timeout but I notice that the app hangs when the server of some website been crawled is not responding at when I use the Java VisualVM to do a thread dump:

我正在开发一个带有 htmlunit 的网络爬虫,我已经添加了所有必需的超时,但我注意到当我使用 Java VisualVM 进行线程转储时,当某个网站的服务器被抓取时应用程序挂起没有响应:

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:88)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:429)
at java.net.Socket.connect(Socket.java:525)
at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:89)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152)
at app.plugin.core.net.QHttpWebConnection.getResponse(QHttpWebConnection.java:30)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)

This is really frustrating since I have no control of those servers. This issue is seriously affecting the performance of my application.

这真的很令人沮丧,因为我无法控制这些服务器。这个问题严重影响了我的应用程序的性能。

Question:

问题:

  1. How can I solve this issue?
  2. Is there a way to get a list of socket connection opened by a Java app and use that to terminate the socket, like simluate that the server closed the connection?
  1. 我该如何解决这个问题?
  2. 有没有办法获取 Java 应用程序打开的套接字连接列表并使用它来终止套接字,例如模拟服务器关闭连接?

回答by Geoff

I believe that when you are in a Java native method, the stack trace will say RUNNABLE even if the call is actually blocked waiting for some event. In essence, I don't believe Java has any way of knowing what a native method is actually doing, so it flags these calls as RUNNABLE. I have seen this with socketRead0() and socketAccept() -- both of which typically block.

我相信,当您使用 Java 本机方法时,即使调用实际上被阻塞等待某个事件,堆栈跟踪也会显示 RUNNABLE。本质上,我不相信 Java 有任何方式知道本地方法实际上在做什么,因此它将这些调用标记为 RUNNABLE。我已经在 socketRead0() 和 socketAccept() 中看到了这一点——它们通常都是阻塞的。

You need to set your timeout to a reasonable length of time such that your request will time out if the server is not responding but not too short in case the server is simply busy. Your application should be written to use multiple threads. I would try running a dozen or more threads and have each thread wait up to five or ten seconds for a response. There is virtually no overhead in having a handful of threads waiting. You should also be mindful of not bombarding a server with lots of requests when writing a web spider.

您需要将超时设置为合理的时间长度,这样您的请求将在服务器没有响应时超时,但在服务器忙的情况下不会太短。您的应用程序应该被编写为使用多线程。我会尝试运行十几个或更多线程,让每个线程等待五到十秒的响应。让少数线程等待几乎没有开销。在编写网络蜘蛛时,您还应该注意不要用大量请求轰炸服务器。

回答by hyde

Here's a blog post which is possibly related: http://javaeesupportpatterns.blogspot.fi/2011/04/javanetsocketinputstreamsocketread0.html

这是一篇可能相关的博客文章:http: //javaeesupportpatterns.blogspot.fi/2011/04/javanetsocketinputstreamsocketread0.html

In short, solution is to make sure that socket timeout is defined. Default is 0, meaning no timeout. How exactly, that depends on the library, in this case apparently com.gargoylesoftware.htmlunit. At a quick glance correct method mightbe com.gargoylesoftware.htmlunit.WebClient.setTimeout.

简而言之,解决方案是确保定义了套接字超时。默认为 0,表示没有超时。究竟如何,这取决于图书馆,在这种情况下显然是com.gargoylesoftware.htmlunit。快速浏览一下正确的方法可能com.gargoylesoftware.htmlunit.WebClient.setTimeout

回答by sken130

If your Java server is on Windows, your last resort is SysInternals TCPView.

如果您的 Java 服务器在 Windows 上,您最后的选择是 SysInternals TCPView。

http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

From it you will see the list of all processes and all local and remote ports, which will include your Java app. You will have to pick the correct connection to close, and after that, the Java Thread will throw an exception and end.

从中您将看到所有进程以及所有本地和远程端口的列表,其中将包括您的 Java 应用程序。您必须选择要关闭的正确连接,然后,Java 线程将抛出异常并结束。

Of course there's risk of closing the wrong connection. After all, this method is the last resort.

当然,存在关闭错误连接的风险。毕竟,这种方法是不得已的。

Update in 23 Aug 2019:

2019 年 8 月 23 日更新:

TCPView is slow when there're a large amount of connections.

当有大量连接时,TCPView 很慢。

The much faster alternative is CurrPorts (from NirSoft): https://www.nirsoft.net/utils/cports.html

更快的替代方案是 CurrPorts(来自 NirSoft):https://www.nirsoft.net/utils/cports.html