java.io.IOException:服务器返回 HTTP 响应代码:URL 503:错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21447862/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:46:50  来源:igfitidea点击:

java.io.IOException: Server returned HTTP response code: 503 for URL: Error

javahtml

提问by user3251567

I'm scraping data from a website by getting the HTML code from the website then parsing it in Java.

我通过从网站获取 HTML 代码然后用 Java 解析它来从网站上抓取数据。

I'm currently using java.net.URL as well as java.net.URLConnection. This is the code I use to get the HTML code from a certain website (Found on this website, slightly edited to fit my needs):

我目前正在使用 java.net.URL 以及 java.net.URLConnection。这是我用来从某个网站获取 HTML 代码的代码(在这个网站上找到,稍作修改以满足我的需要):

public static String getURL(String name) throws Exception{

    //Set URL
    String s = "";
    URL url = new URL(name);
    URLConnection spoof = url.openConnection();

    //Spoof the connection so we look like a web browser
    spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
    BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
    String strLine = "";

    //Loop through every line in the source
    while ((strLine = in.readLine()) != null){

        //Prints each line to the console
        s = s + strLine + "\n";
    }
    return s;
}

When I run it, the HTML code is received correctly for about 100-200 webpages. However, before I am done grabbing HTML code, I get a "java.io.IOException: Server returned HTTP response code: 503 for URL" exception. I've researched this topic fully and other questions like thisone do not cover the package I am using.

当我运行它时,可以正确接收大约 100-200 个网页的 HTML 代码。但是,在我完成抓取 HTML 代码之前,我收到一个“java.io.IOException:服务器返回 HTTP 响应代码:URL 的 503”异常。我已经充分研究这个话题,像其他的问题,这样一个并不包括我现在用的包。

Thanks in advance for the help!

在此先感谢您的帮助!

回答by Vlad Sonkin

Maybe server have a limits. In this case you can try Socket and input/outputStream instead of URLConnection

也许服务器有限制。在这种情况下,您可以尝试使用 Socket 和 input/outputStream 而不是 URLConnection