如何验证 URL 在 Java 1.6 中是否有效?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3138941/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 00:31:03  来源:igfitidea点击:

How to verify that URL is valid in Java 1.6?

javaurlparsing

提问by Bart?omiej Kalinowski

My application processes URLs entered manually by users. I have discovered that some of malformed URLs (like 'http:/not-valid') result in NullPointerException thrown when connection is being opened. As I learned from this Java bug report, the issue is known and will not be fixed. The suggestion is to use java.net.URI, which is "more RFC 2396-conformant".

我的应用程序处理用户手动输入的 URL。我发现一些格式错误的 URL(如“http:/not-valid”)会导致在打开连接时抛出 NullPointerException。正如我从这个 Java 错误报告中了解到的,这个问题是已知的,不会被修复。建议使用 java.net.URI,它“更符合 RFC 2396”。

Question is: how to use URI to work around the problem? The only thing I can do with URI is to use it to parse string and generate URL. I have prepared following program:

问题是:如何使用URI来解决这个问题?我对 URI 唯一能做的就是用它来解析字符串并生成 URL。我准备了以下程序:

import java.net.*;

public class Test
{
    public static void main(String[] args)
    {
       try {
           URI uri = URI.create(args[0]);
           Object o = uri.toURL().getContent(); // try to get content
       }
       catch(Throwable e) {
           e.printStackTrace();
       }
    }
}

Here are results of my tests (with java 1.6.0_20), not much different from what I get with java.net.URL:

这是我的测试结果(使用 java 1.6.0_20),与我使用 java.net.URL 得到的结果没有太大区别:

sh-3.2$ java Test url-not-valid
java.lang.IllegalArgumentException: URI is not absolute
        at java.net.URI.toURL(URI.java:1080)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:/url-not-valid
java.lang.NullPointerException
        at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:///url-not-valid
java.lang.IllegalArgumentException: protocol = http host = null
        at sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:796)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:////url-not-valid
java.lang.NullPointerException
        at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)

采纳答案by Shekhar

You can use appache Validator Commons ..

您可以使用 appache Validator Commons ..

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://google.com");

http://commons.apache.org/validator/

http://commons.apache.org/validator/

http://commons.apache.org/validator/api-1.3.1/

http://commons.apache.org/validator/api-1.3.1/

回答by Pete Kirkham

If I run your code with the type of malformed URI in the bug reportthen it throws URISyntaxException. So the suggested fix fixes the reported error.

如果我在错误报告中使用格式错误的 URI 类型运行您的代码那么它会抛出 URISyntaxException。因此,建议的修复修复了报告的错误。

$ java -cp bin UriTest http:\\www.google.com\
java.lang.IllegalArgumentException
    at java.net.URI.create(URI.java:842)
    at UriTest.main(UriTest.java:8)
Caused by: java.net.URISyntaxException: Illegal character in opaque part at index 5: http:\www.google.com\
    at java.net.URI$Parser.fail(URI.java:2809)
    at java.net.URI$Parser.checkChars(URI.java:2982)
    at java.net.URI$Parser.parse(URI.java:3019)
    at java.net.URI.(URI.java:578)
    at java.net.URI.create(URI.java:840)

Your type of malformed URI is different, and does not appear to be a syntax error.

您的格式错误的 URI 类型不同,并且似乎不是语法错误。

Instead, catch the null pointer exception and recover with a suitable message.

相反,捕获空指针异常并使用合适的消息进行恢复。

You could try and be friendly and check whether the URI starts with a single slash "http:/" and suggest that to the user, or you can check whether the hostname of the URL is non-empty:

您可以尝试友好地检查 URI 是否以单斜杠“http:/”开头并向用户建议,或者您可以检查 URL 的主机名是否为非空:

import java.net.*;

public class UriTest
{
    public static void main ( String[] args )
    {
        try {
            URI uri = URI.create ( args[0] );

            // avoid null pointer exception
            if ( uri.getHost() == null )
                throw new MalformedURLException ( "no hostname" );

            URL url = uri.toURL();
            URLConnection s = url.openConnection();

            s.getInputStream();
        } catch ( Throwable e ) {
            e.printStackTrace();
        }
    }
}

回答by smola

Note that even with the approaches proposed in the other answers, you wouldn't get validation right, since java.net.URIadheres to RFC 2396, which is notably outdated. By using java.net.URI, you'll get exceptions for URLs that today are valid for all web browsers.

请注意,即使使用其他答案中提出的方法,您也不会得到正确的验证,因为java.net.URI遵守 RFC 2396,该标准已经过时。通过使用java.net.URI,您将获得今天对所有 Web 浏览器都有效的 URL 的例外情况。

In order to solve these issues, I wrote a library for URL parsing in Java: galimatias. It performs URL parsing the same way web browsers do (adhering to the WHATWG URL Specification).

为了解决这些问题,我用Java编写了一个URL解析库:galimatias。它以与 Web 浏览器相同的方式执行 URL 解析(遵守WHATWG URL 规范)。

In your case, you can write:

在你的情况下,你可以写:

try {
    URL url = io.mola.galimatias.URL.parse(url).toJavaURL();
} catch (GalimatiasParseException e) {
    // If this exception is thrown, the given URL contains a unrecoverable error. That is, it's completely invalid.
}

As a nice side-effect, you get a lot of sanitization that you won't get with java.net.URI. For example, http:/example.comwill be correctly parsed as http://example.com/.

作为一个很好的副作用,你会得到很多你不会得到的消毒java.net.URI。例如,http:/example.com将被正确解析为http://example.com/.