Java - 检查 URL 是否存在的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18134718/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 22:39:00  来源:igfitidea点击:

Java - Quickest way to check if URL exists

javaurljsoup

提问by Matt9Atkins

Hi I am writing a program that goes through many different URLs and just checks if they exist or not. I am basically checking if the error code returned is 404 or not. However as I am checking over 1000 URLs, I want to be able to do this very quickly. The following is my code, I was wondering how I can modify it to work quickly (if possible):

嗨,我正在编写一个程序,它通过许多不同的 URL 并检查它们是否存在。我基本上是在检查返回的错误代码是否为 404。但是,当我检查 1000 多个 URL 时,我希望能够非常快速地完成此操作。以下是我的代码,我想知道如何修改它以快速工作(如果可能):

final URL url = new URL("http://www.example.com");
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
int responseCode = huc.getResponseCode();

if (responseCode != 404) {
System.out.println("GOOD");
} else {
System.out.println("BAD");
}

Would it be quicker to use JSoup?

使用JSoup会更快吗?

I am aware some sites give the code 200 and have their own error page, however I know the links that I am checking dont do this, so this is not needed.

我知道有些网站给出了代码 200 并且有自己的错误页面,但是我知道我正在检查的链接不这样做,所以不需要这样做。

采纳答案by Vishnuprasad R

Try sending a "HEAD" request instead of get request. That should be faster since the response body is not downloaded.

尝试发送“HEAD”请求而不是获取请求。这应该更快,因为没有下载响应正文。

huc.setRequestMethod("HEAD");

Again instead of checking if response status is not 400, check if it is 200. That is check for positive instead of negative. 404,403,402.. all 40x statuses are nearly equivalent to invalid non-existant url.

再次检查响应状态是否不是 400,而是检查它是否是 200。即检查正数而不是负数。404,403,402.. 所有 40x 状态几乎等同于无效的不存在的 url。

You may make use of multi-threading to make it even faster.

您可以使用多线程使其更快。

回答by Spark8006

Seems you can set the timeout property, make sure it is acceptable. And if you have many urls to test, do them parallelly, it will be much faster. Hope this will be helpful.

似乎您可以设置超时属性,确保它是可以接受的。如果你有很多 url 需要测试,并行执行它们,速度会快得多。希望这会有所帮助。

回答by Khinsu

Try to ask the next DNS Server

试问下DNS服务器

class DNSLookup
{
    public static void main(String args[])
    {
        String host = "stackoverflow.com";
        try
        {
            InetAddress inetAddress = InetAddress.getByName(host);
            // show the Internet Address as name/address
            System.out.println(inetAddress.getHostName() + " " + inetAddress.getHostAddress());
        }
        catch (UnknownHostException exception)
        {
            System.err.println("ERROR: Cannot access '" + host + "'");
        }
        catch (NamingException exception)
        {
            System.err.println("ERROR: No DNS record for '" + host + "'");
            exception.printStackTrace();
        }
    }
}