Java 如何确定字符串是否已被 URL 编码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2295223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 05:47:42  来源:igfitidea点击:

How to find out if string has already been URL encoded?

javautf-8url-encoding

提问by Trick

How could I check if string has already been encoded?

如何检查字符串是否已被编码?

For example, if I encode TEST==, I get TEST%3D%3D. If I again encode last string, I get TEST%253D%253D, I would have to know before doing that if it is already encoded...

例如,如果我编码TEST==,我得到TEST%3D%3D. 如果我再次编码最后一个字符串,我会得到TEST%253D%253D,如果它已经被编码,我必须在这样做之前知道......

I have encoded parameters saved, and I need to search for them. I don't know for input parameters, what will they be - encoded or not, so I have to know if I have to encode or decode them before search.

我已经保存了编码参数,我需要搜索它们。我不知道输入参数,它们将被编码与否,所以我必须知道在搜索之前是否必须对它们进行编码或解码。

采纳答案by SF.

Decode, compare to original. If it does differ, original is encoded. If it doesn't differ, original isn't encoded. But still it says nothing about whether the newly decoded version isn't still encoded. A good task for recursion.

解码,与原始比较。如果确实不同,则对原始文件进行编码。如果没有不同,则原始文件未编码。但它仍然没有说明新解码的版本是否仍在编码。递归的好任务。

I hope one can't write a quine in urlencode, or this algorithm would get stuck.

我希望一个人不能在 urlencode 中写一个 quine,否则这个算法会卡住。

Exception: When a string contains "+" character url decoder replaces it with a space even though the string is not url encoded

例外:当字符串包含“+”字符时,url 解码器将其替换为空格,即使该字符串不是 url 编码的

回答by flybywire

You can't know for sure, unless your strings conform to a certain pattern, or you keep track of your strings. As you noted by yourself, a String that is encoded can also be encoded, so you can't be 100% sure by looking at the string itself.

你不能确定,除非你的字符串符合某种模式,或者你跟踪你的字符串。正如您自己所指出的,编码的字符串也可以编码,因此您不能通过查看字符串本身来 100% 确定。

回答by Roman

Use regexp to check if your string contains illegal characters (i.e. characters which cannot be found in URL-encoded string, like whitespace).

使用正则表达式检查您的字符串是否包含非法字符(即在 URL 编码字符串中找不到的字符,如空格)。

回答by Padmarag

Joel on software had a solution for this sometime back - http://www.joelonsoftware.com/articles/Wrong.html
Or You may add some prefix to the Strings.

软件上的 Joel 曾经为此提供了一个解决方案 - http://www.joelonsoftware.com/articles/Wrong.html
或者您可以在字符串中添加一些前缀。

回答by amit_saxena

Try decoding the url. If the resulting string is shorter than the original then the original URL was already encoded, else you can safely encode it (either it is not encoded, or even post encoding the url stays as is, so encoding again will not result in a wrong url). Below is sample pseudo (inspired by ruby) code:

尝试解码网址。如果结果字符串比原始字符串短,那么原始 URL 已经被编码,否则您可以安全地对其进行编码(或者它没有被编码,或者甚至对 url 进行编码后保持原样,因此再次编码不会导致错误的 url )。下面是示例伪(受 ruby​​ 启发)代码:

# Returns encoded URL for any given URL after determining whether it is already encoded or not
    def escape(url)
      unescaped_url = URI.unescape(url)
      if (unescaped_url.length < url.length)
        return url
      else
        return URI.escape(url)
      end
    end

回答by jschnasse

Check your URL for suspicious characters[1]. List of candidates:

检查您的 URL 是否存在可疑字符 [1]。候选人名单:

WHITE_SPACE ,", < , > , { , } , | , \ , ^ , ~ , [ , ] , .and `

WHITE_SPACE ,", < , > , { , } , | , \ , ^ , ~ , [ , ] , .和`

I use:

我用:

private static boolean isAlreadyEncoded(String passedUrl) {
        boolean isEncoded = true;
        if (passedUrl.matches(".*[\ \"\<\>\{\}|\\^~\[\]].*")) {
                isEncoded = false;
        }
        return isEncoded;
}

For the actual encoding I proceed with:

对于实际编码,我继续:

https://stackoverflow.com/a/49796882/1485527

https://stackoverflow.com/a/49796882/1485527

Note: Even if your URL doesn't contain unsafe characters you might want to apply, e.g. Punnycode encoding to the host name. So there is still much space for additional checks.

注意:即使您的 URL 不包含您可能想要应用的不安全字符,例如 Punnycode 编码到主机名。所以还有很大的空间进行额外的检查。



[1] A list of candidates can be found in the section "unsafe" of the URL specat Page 2. In my understanding '%' or '#' should be left out in the encoding check, since these characters can occur in encoded URLs as well.

[1]在第 2 页的URL 规范的“不安全”部分中可以找到候选列表。根据我的理解,在编码检查中应该省略“%”或“#”,因为这些字符可能会出现在编码中网址也一样。

回答by esergion

If you want to be sure that string is encoded correctly (if it needs to be encoded) - just decode and encode it once again.

如果您想确保字符串编码正确(如果需要编码) - 只需再次解码和编码即可。

metacode:

元代码:

100%_correctly_encoded_string = encode(decode(input_string))

already encoded string will remain untouched. Unencoded string will be encoded. String with only url-allowed characters will remain untouched too.

已经编码的字符串将保持不变。未编码的字符串将被编码。仅包含 url 允许字符的字符串也将保持不变。

回答by Luke Mlsna

According to the spec (https://tools.ietf.org/html/rfc3986) all URLs MUSTstart with a scheme followed by a :

根据规范 ( https://tools.ietf.org/html/rfc3986),所有 URL必须以方案开头,后跟:

Since colons are required as the delimiter between a scheme and the rest of the URI, any string that contains a colon is not encoded.

由于需要使用冒号作为方案和 URI 其余部分之间的分隔符,因此不会对任何包含冒号的字符串进行编码。

(This assumes you will not be given an incomplete URI with no scheme.)

(这假设您不会得到一个没有方案的不完整 URI。)

So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI.

因此,您可以测试字符串是否包含冒号,如果没有,则对其进行 urldecode,如果该字符串包含冒号,则原始字符串是 url 编码的,如果没有,请检查字符串是否不同,如果是,则再次 urldecode,如果不是,它不是有效的 URI。

You can make this loop simpler if you know what schemes you can expect.

如果你知道你可以期待什么方案,你可以使这个循环更简单。

回答by Alberto

Thanks to this answerI coded a function (JS Language) that encodes the URL just once with encodeURIso you can call it to make sure is encoded just once and you don't need to know if the URL is already encoded.

感谢这个答案,我编写了一个函数(JS 语言),encodeURI它只对URL 进行一次编码,因此您可以调用它以确保只编码一次,并且您不需要知道 URL 是否已经编码。

ES6:

ES6:

var getUrlEncoded = sURL => {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Pre ES6:

ES6 前:

var getUrlEncoded = function(sURL) {
    if (decodeURI(sURL) === sURL) return encodeURI(sURL)
    return getUrlEncoded(decodeURI(sURL))
}

Here are some tests so you can see the URL is only encoded once:

以下是一些测试,因此您可以看到 URL 仅编码一次

getUrlEncoded("https://example.com/media/Screenshot27 UI Home.jpg")
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(encodeURI(encodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg"))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"
getUrlEncoded(decodeURI(decodeURI("https://example.com/media/Screenshot27 UI Home.jpg")))
//"https://example.com/media/Screenshot27%20UI%20Home.jpg"

回答by subject47

Using Spring UriComponentsBuilder:

使用 Spring UriComponentsBuilder:

import java.net.URI;
import org.springframework.web.util.UriComponentsBuilder;

private URI getProperlyEncodedUri(String uriString) {
    try {
        return URI.create(uriString);
    } catch (IllegalArgumentException e) {
        return UriComponentsBuilder.fromUriString(uriString).build().toUri();
    }
}