Java 我在 Jsoup 中收到一个 SocketTimeoutException:读取超时
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6571548/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
I get a SocketTimeoutException in Jsoup: Read timed out
提问by C. Maillard
I get a SocketTimeoutException when I try to parse a lot of HTML documents using Jsoup.
For example, I got a list of links :
当我尝试使用 Jsoup 解析大量 HTML 文档时,出现 SocketTimeoutException。
例如,我得到了一个链接列表:
<a href="www.domain.com/url1.html">link1</a>
<a href="www.domain.com/url2.html">link2</a>
<a href="www.domain.com/url3.html">link3</a>
<a href="www.domain.com/url4.html">link4</a>
For each link, I parse the document linked to the URL (from the href attribute) to get other pieces of information in those pages.
So I can imagine that it takes lot of time, but how to shut off this exception?
Here is the whole stack trace:
对于每个链接,我解析链接到 URL 的文档(来自 href 属性)以获取这些页面中的其他信息。
所以我可以想象这需要很多时间,但是如何关闭这个异常呢?
这是整个堆栈跟踪:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.HttpURLConnection.getResponseCode(Unknown Source)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at app.ForumCrawler.crawl(ForumCrawler.java:50)
at Main.main(Main.java:15)
Thank you buddies!
谢谢小伙伴们!
EDIT:Hum... Sorry, just found the solution:
编辑:嗯......对不起,刚刚找到解决方案:
Jsoup.connect(url).timeout(0).get();
Hope that could be useful for someone else... :)
希望对其他人有用... :)
回答by MarcoS
I think you can do
我认为你可以做到
Jsoup.connect("...").timeout(10 * 1000).get();
which sets timeout to 10s.
将超时设置为 10 秒。
回答by amaidment
Ok - so, I tried to offer this as an edit to MarcoS's answer, but the edit was rejected. Nevertheless, the following information may be useful to future visitors:
好的 - 所以,我试图将此作为对 MarcoS 答案的编辑,但该编辑被拒绝。尽管如此,以下信息可能对未来的访问者有用:
According to the javadocs, the default timeoutfor an org.jsoup.Connection
is 30 seconds.
根据javadocs,an的默认超时为org.jsoup.Connection
30 秒。
As has already been mentioned, this can be set using timeout(int millis)
正如已经提到的,这可以使用设置 timeout(int millis)
Also, as the OP notes in the edit, this can also be set using timeout(0)
. However, as the javadocs state:
此外,作为编辑中的 OP 注释,这也可以使用timeout(0)
. 但是,正如 javadoc 所述:
A timeout of zero is treated as an infinite timeout.
零超时被视为无限超时。
回答by Gaurab Pradhan
Set timeout while connecting from jsoup.
从 jsoup 连接时设置超时。
回答by Bartek
There is mistake on https://jsoup.org/apidocs/org/jsoup/Connection.html. Default timeout is not 30 seconds. It is 3 seconds. Just look at javadoc in codes. It says 3000 ms.
https://jsoup.org/apidocs/org/jsoup/Connection.html上有错误。默认超时不是 30 秒。是 3 秒。只需查看代码中的 javadoc 即可。它说 3000 毫秒。
回答by invzbl3
I had the same error:
我有同样的错误:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
and only setting .userAgent(Opera)
worked for me.
只有设置.userAgent(Opera)
对我有用。
So I used Connection userAgent(String userAgent)
method of Connection class to set Jsoup user agent.
所以我使用Connection userAgent(String userAgent)
Connection类的方法来设置Jsoup用户代理。
Something like:
就像是:
Jsoup.connect("link").userAgent("Opera").get();
回答by Prasanna Mendon
This should work:
Jsoup.connect(url.toLowerCase()).timeout(0);
.
这应该工作:
Jsoup.connect(url.toLowerCase()).timeout(0);
。