如何在 Java 中解析这样的 URI

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1828641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 23:02:09  来源:igfitidea点击:

How to parse a URI like this in Java

javaparsinguri

提问by Frank

I'm trying to parse the following URI : http://translate.google.com/#zh-CN|en|你

我正在尝试解析以下 URI:http://translate.google.com/#zh-CN|en|你

but got this error message :

但收到此错误消息:

java.net.URISyntaxException: Illegal character in fragment at index 34: http://translate.google.com/#zh-CN|en|你
        at java.net.URI$Parser.fail(URI.java:2809)
        at java.net.URI$Parser.checkChars(URI.java:2982)
        at java.net.URI$Parser.parse(URI.java:3028)

It's having problem with the "|" character, if I get rid of the "|", the last Chinese char is not causing any problem, what's the right way to handle this ?

“|”有问题 字符,如果我去掉“|”,最后一个中文字符不会引起任何问题,处理这个问题的正确方法是什么?

My method look like this :

我的方法是这样的:

  public static void displayFileOrUrlInBrowser(String File_Or_Url)
  {
    try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E"))); }
    catch (Exception e) { e.printStackTrace(); }
  }

Thanks for the answers, but BalusC's solution seems to work only for an instance of the url, my method needs to work with any url I pass to it, how would it know where's the starting point to cut the url into two parts and only encode the second part ?

感谢您的回答,但 BalusC 的解决方案似乎仅适用于 url 的一个实例,我的方法需要使用我传递给它的任何 url,它如何知道将 url 分成两部分并仅编码的起点在哪里第二部分?

采纳答案by Spike Williams

The pipe character is "considered unsafe"for use in URLs. You can fix it by replacing the | with its encoded hex equivalent, which would be "%7C"

在 URL 中使用管道字符被“认为是不安全的”。您可以通过替换 | 来修复它 其编码的十六进制等效值,即“%7C”

However, replacing individual characters in a URL is a brittle solution that does not work very well when you consider that, in any given URL, there could potentially be quite a number of different characters that may need to be replaced. You are already replacing spaces, carets, and pipes.... but what about brackets, and accent marks, and quotation marks? Or question marks and ampersands, which may or may not be valid parts of a URL, depending on how they are used?

然而,替换 URL 中的单个字符是一种脆弱的解决方案,当您考虑到在任何给定 URL 中可能需要替换相当多的不同字符时,它不会很好地工作。您已经在替换空格、插入符号和管道……但是括号、重音符号和引号呢?或者问号和与号,它们可能是也可能不是 URL 的有效部分,具体取决于它们的使用方式?

Thus, a superior solution would be to use the language's facility for encoding URLs, rather than doing it manually. In the case of Java, use URLEncoder, as per the example in BalusC's answer to this question.

因此,更好的解决方案是使用该语言的工具对 URL 进行编码,而不是手动进行。在 Java 的情况下,根据 BalusC 对此问题的回答中的示例,使用URLEncoder

回答by Frank

Alright, I found how to do it, like this :

好吧,我找到了怎么做,就像这样:

try { Desktop.getDesktop().browse(new URI(File_Or_Url.replace(" ","%20").replace("^","%5E").replace("|","%7C"))); }
catch (Exception e) { e.printStackTrace(); }

回答by Geo

Aren't you better off using URLEncoderthan selectively encoding stuff?

使用URLEncoder不是比选择性编码更好吗?

回答by BalusC

You should use java.net.URLEncoderto URL-encode the query with UTF-8. You don't necessarily need regex for this. You don't want to have a regex to cover all of those thousands Chinese glyphs, do you? ;)

您应该使用java.net.URLEncoder.url 对查询进行 URL 编码UTF-8。为此,您不一定需要正则表达式。你不想让一个正则表达式来覆盖所有这些数千个中国字形,是吗?;)

String query = URLEncoder.encode("zh-CN|en|你", "UTF-8");
String url = "http://translate.google.com/#" + query;
Desktop.getDesktop().browse(new URI(url));    

回答by Federico Pugnali

The URLEncoder solution didn't work for me, maybe because it encodes just everything. I was trying to use apache's HttpGet and it throws error with a url as string encoded like that.

URLEncoder 解决方案对我不起作用,可能是因为它只对所有内容进行编码。我试图使用 apache 的 HttpGet 并且它抛出错误,网址是这样编码的字符串。

The correct way in my case was this strange code:

在我的情况下,正确的方法是这个奇怪的代码:

URL url = new URL(pageURLAsUnescapedString);
URI uri = new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), url.getRef());

Somehow url.toURI does not work the same way. URI constructors work in two ways: if you use the one with a single String parameter, the constructor pretends the provided uri is correctly escaped (and thus the error, the same happens with the String constructor of HttpGet); if you use the multiple Strings URI constructor, then the class handles everything unescaped very well (and HttpGet has another constructor accepting an URI). Why URL.toURI() does not do this? I have no clue...

不知何故 url.toURI 的工作方式不同。URI 构造函数有两种工作方式:如果你使用带有单个 String 参数的构造函数,构造函数会假装提供的 uri 被正确转义(因此错误,HttpGet 的 String 构造函数也会发生同样的情况);如果您使用多个 Strings URI 构造函数,则该类可以很好地处理所有未转义的内容(并且 HttpGet 有另一个接受 URI 的构造函数)。为什么 URL.toURI() 不这样做?我没有线索...

Hope it helps someone, it took me some hours to figure it out.

希望它可以帮助某人,我花了几个小时才弄明白。

回答by Gili

Taking the best of Federico's answerand Marek's answer, you need to do the following:

充分利用Federico's answerMarek's answer,您需要执行以下操作:

URL url = new URL(pageURLAsUnescapedString);

// URI's constructor expects the path, query string and fragment to be decoded.
// If we do not decode them, we will end up with double-encoding.
String path = url.getPath();
if (path != null)
  path = URLDecoder.decode(path, "UTF-8");
String query = url.getQuery();
if (query != null)
  query = URLDecoder.decode(query, "UTF-8");
String fragment = url.getRef();
if (fragment != null)
  fragment = URLDecoder.decode(fragment, "UTF-8");

URI uri = new URI(url.getProtocol(), url.getAuthority(), path, query, fragment);

回答by vaquar khan

First encode your URL ,please use following example , then pass URL into method

首先编码您的网址,请使用以下示例,然后将网址传递给方法

        JSONObject json = new JSONObject();
        json.put("name", "vaquar");
        json.put("age", "30");
        json.put("address", "asasbsa bajsb ");


        System.out.println("in sslRestClientGETRankColl"+json.toString());

        String createdJson=json.toString();

        createdJson= URLEncoder.encode(createdJson, "UTF-8");

//call method now displayFileOrUrlInBrowser(createdJson);

//现在调用方法 displayFileOrUrlInBrowser(createdJson);

public static void displayFileOrUrlInBrowser(String File_Or_Url)
  {
    try { Desktop.getDesktop().browse(File_Or_Url); }
    catch (Exception e) { e.printStackTrace(); }
  }