php 多字节字符串上的 str_replace() 危险吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3786003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 11:05:57  来源:igfitidea点击:

str_replace() on multibyte strings dangerous?

phpmultibyte

提问by user456885

Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?

鉴于某些多字节字符集,我是否正确地假设以下内容不符合预期?

$string = str_replace('"', '\"', $string);

In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").

特别是,如果输入的字符集中可能包含像 0xbf5c 这样的有效字符,那么攻击者可以注入 0xbf22 以获取 0xbf5c22,留下一个有效字符后跟一个不带引号的双引号 (")。

Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?

有没有一种简单的方法可以缓解这个问题,还是我首先误解了这个问题?

(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)

(在我的例子中,字符串将进入 HTML 输入标签的 value 属性:echo 'input type="text" value="' . $string . '">';)

EDIT: For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?

编辑:就此而言,像 preg_quote() 这样的函数怎么样?它没有字符集参数,因此在这种情况下它似乎完全没用。当您没有将字符集限制为 UTF-8 的选项时(是的,这很好),看起来您真的很残障。在这种情况下可以使用哪些替换和引用功能?

回答by Gumbo

No, you're right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functionsinstead, for example mb_ereg_replaceor mb_split:

不,您说得对:对多字节字符串使用单字节字符串函数可能会导致意外结果。改用多字节字符串函数,例如mb_ereg_replaceor mb_split

$string = mb_ereg_replace('"', '\"', $string);
$string = implode('\"', mb_split('"', $string));


Edit????Here's a mb_replaceimplementation using the split-join variant:

编辑????这是mb_replace使用拆分连接变体的实现:

function mb_replace($search, $replace, $subject, &$count=0) {
    if (!is_array($search) && is_array($replace)) {
        return false;
    }
    if (is_array($subject)) {
        // call mb_replace for each single string in $subject
        foreach ($subject as &$string) {
            $string = &mb_replace($search, $replace, $string, $c);
            $count += $c;
        }
    } elseif (is_array($search)) {
        if (!is_array($replace)) {
            foreach ($search as &$string) {
                $subject = mb_replace($string, $replace, $subject, $c);
                $count += $c;
            }
        } else {
            $n = max(count($search), count($replace));
            while ($n--) {
                $subject = mb_replace(current($search), current($replace), $subject, $c);
                $count += $c;
                next($search);
                next($replace);
            }
        }
    } else {
        $parts = mb_split(preg_quote($search), $subject);
        $count = count($parts)-1;
        $subject = implode($replace, $parts);
    }
    return $subject;
}

As regards the combination of parameters, this function should behave like the singlebyte str_replace.

至于参数的组合,这个函数的行为应该像 singlebyte str_replace

回答by R.. GitHub STOP HELPING ICE

The code is perfectly safe with sanemultibyte-encodings like UTF-8 and EUC-TW, but dangerous with brokenones like Shift_JIS, GB*, etc. Rather than going through all the headache and overhead to be safe with these legacy encodings, I would recommend just supporting only UTF-8.

该代码对于UTF-8 和 EUC-TW 等健全的多字节编码是完全安全的,但对于像 Shift_JIS、GB* 等损坏的编码来说是危险的。 而不是为了这些遗留编码的安全而经历所有的麻烦和开销,我建议只支持 UTF-8。

回答by reko_t

You could use either mb_ereg_replaceby first specifying the charset with mb_regex_encoding(). Alternatively if you use UTF-8, you can use preg_replacewith the umodifier.

您可以mb_ereg_replace通过首先使用mb_regex_encoding(). 或者,如果您使用 UTF-8,则可以preg_replaceu修饰符一起使用。

回答by jeffkee

From what I understand, much of this type of string injection is solved by the mysql_real_escape_string(); function.

据我了解,这种类型的字符串注入大部分是由 mysql_real_escape_string(); 解决的。功能。

http://php.net/manual/en/function.mysql-real-escape-string.php

http://php.net/manual/en/function.mysql-real-escape-string.php