测试字符串是否是用 PHP 编码的 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1637762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 03:21:07  来源:igfitidea点击:

Test if string is URL encoded in PHP

phptestingurl-encoding

提问by Psytronic

How can I test if a string is URL encoded?

如何测试字符串是否是 URL 编码的?

Which of the following approaches is better?

以下哪种方法更好?

  • Search the string for characters which would be encoded, which aren't, and if any exist then its not encoded, or
  • Use something like this which I've made:
  • 在字符串中搜索将被编码的字符,哪些不是,如果存在,则其未编码,或
  • 使用我制作的这样的东西:

function is_urlEncoded($string){
 $test_string = $string;
 while(urldecode($test_string) != $test_string){
  $test_string = urldecode($test_string);
 }
 return (urlencode($test_string) == $string)?True:False; 
}

$t = "Hello World > how are you?";
if(is_urlEncoded($sreq)){
 print "Was Encoded.\n";
}else{
 print "Not Encoded.\n";
 print "Should be ".urlencode($sreq)."\n";
}

The above code works, but not in instances where the string has been doubly encoded, as in these examples:

上面的代码有效,但不适用于字符串被双重编码的情况,如以下示例所示:

  • $t = "Hello%2BWorld%2B%253E%2Bhow%2Bare%2Byou%253F";
  • $t = "Hello+World%2B%253E%2Bhow%2Bare%2Byou%253F";
  • $t = "Hello%2BWorld%2B%253E%2Bhow%2Bare%2Byou%253F";
  • $t = "Hello+World%2B%253E%2Bhow%2Bare%2Byou%253F";

采纳答案by jheddings

You'll never know for sure if a string is URL-encoded or if it was supposed to have the sequence %2Bin it. Instead, it probably depends on where the string came from, i.e. if it was hand-crafted or from some application.

你永远无法确定一个字符串是否是 URL 编码的,或者它是否应该包含序列%2B。相反,它可能取决于字符串的来源,即它是手工制作的还是来自某个应用程序。

Is it better to search the string for characters which would be encoded, which aren't, and if any exist then its not encoded.

最好在字符串中搜索将被编码的字符,哪些不是,如果存在,则它没有被编码。

I think this is a better approach, since it would take care of things that have been done programmatically (assuming the application would not have left a non-encoded character behind).

我认为这是一种更好的方法,因为它会处理以编程方式完成的事情(假设应用程序不会留下未编码的字符)。

One thing that will be confusing here... Technically, the %"should be" encoded if it will be present in the final value, since it is a special character. You might have to combine your approaches to look for should-be-encoded characters as well as validating that the string decodes successfully if none are found.

这里会令人困惑的一件事......从技术上讲,%如果它出现在最终值中,“应该”编码,因为它是一个特殊字符。您可能必须结合您的方法来查找应编码的字符,并在未找到的情况下验证字符串是否成功解码。

回答by Irfan

i have one trick :

我有一个技巧:

you can do this to prevent doubly encode. Every time first decode then again encode;

您可以这样做以防止双重编码。每次先解码再编码;

$string = urldecode($string);

Then do again

然后再做

$string = urlencode($string);

Performing this way we can avoid double encode :)

执行这种方式我们可以避免双重编码:)

回答by AMB

Here is something i just put together.

这是我刚刚放在一起的东西。

if ( urlencode(urldecode($data)) === $data){
    echo 'string urlencoded';
} else {
    echo 'string is NOT urlencoded';
}

回答by Kaivosukeltaja

I think there's no foolproof way to do it. For example, consider the following:

我认为没有万无一失的方法来做到这一点。例如,请考虑以下情况:

$t = "A+B";

Is that an URL encoded "A B" or does it need to be encoded to "A%2BB"?

这是一个 URL 编码的“A B”还是需要编码为“A%2BB”?

回答by user187291

well, the term "url encoded" is a bit vague, perhaps simple regex check will do the trick

好吧,术语“url编码”有点模糊,也许简单的正则表达式检查就可以解决问题

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);

回答by falstro

There's no reliable way to do this, as there are strings which stay the same through the encoding process, i.e. is "abc" encoded or not? There's no clear answer. Also, as you've encountered, some characters have multiple encodings... But...

没有可靠的方法可以做到这一点,因为有些字符串在编码过程中保持不变,即“abc”是否编码?没有明确的答案。此外,正如您所遇到的,有些字符有多种编码......但是......

Your decode-check-encode-check scheme fails due to the fact that some characters may be encoded in more than one way. However, a slight modification to your function should be fairly reliable, just check if the decode modifies the string, if it does, it was encoded.

由于某些字符可能以多种方式编码,因此您的解码检查编码检查方案失败。但是,对您的函数稍作修改应该是相当可靠的,只需检查解码是否修改了字符串,如果是,则它已被编码。

It won't be fool proof of course, as "10+20=30" will return true (+ gets converted to space), but we're actually just doing arithmetic. I suppose this is what you're scheme is attempting to counter, I'm sorry to say that I don't think there's a perfect solution.

这当然不是万无一失的,因为“10+20=30”将返回 true(+ 被转换为空格),但我们实际上只是在做算术。我想这就是你的计划试图反击的,很抱歉,我认为没有完美的解决方案。

HTH.

哈。

Edit:
As I entioned in my own comment (just reiterating here for clarity), a good compromise would probably be to check for invalid characters in your url (e.g. space), and if there are some it's not encoded. If there are none, try to decode and see if the string changes. This still won't handle the arithmetic above (which is impossible), but it'll hopefully be sufficient.

编辑:
正如我在自己的评论中提到的(为了清楚起见,在此重申),一个好的折衷方案可能是检查您的 url(例如空格)中的无效字符,如果有一些未编码。如果没有,请尝试解码并查看字符串是否发生变化。这仍然不能处理上面的算术(这是不可能的),但希望它足够了。

回答by Sebastian

What about:

关于什么:

if (urldecode(trim($url)) == trim($url)) { $url_form = 'decoded'; }
  else { $url_form = 'encoded'; }

Will not work with double encoding but this is out of scope anyway I suppose?

不能使用双重编码,但无论如何我认为这超出了范围?

回答by B L Praveen

@user187291 code works and only fails when + is not encoded.

@user187291 代码有效,只有在 + 未编码时才会失败。

I know this is very old post. But this worked to me.

我知道这是很老的帖子。但这对我有用。

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);
if($is_encoded) {
 $string  = urlencode(urldecode(str_replace(['+','='], ['%2B','%3D'], $string)));
} else {
  $string = urlencode($string);
}

回答by phpBananas

send a variable that flags the decode when you already getting data from an url.

当您已经从 url 获取数据时,发送一个标记解码的变量。

?path=folder/new%20file.txt&decode=1

回答by Hoytman

I am using the following test to see if strings have been urlencoded:

我正在使用以下测试来查看字符串是否已进行 urlencoded:

if(urlencode($str) != str_replace(['%','+'], ['%25','%2B'], $str))

If a string has already been urlencoded, the only characters that will changed by double encoding are % (which starts all encoded character strings) and + (which replaces spaces.) Change them back and you should have the original string.

如果一个字符串已经被 urlencoded,唯一会被双重编码改变的字符是 %(它开始所有编码的字符串)和 +(它替换空格。)把它们改回来,你应该有原始字符串。

Let me know if this works for you.

让我知道这是否适合您。