PHP 多字节 str_replace?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1451144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:38:27  来源:igfitidea点击:

PHP Multi Byte str_replace?

phpstringreplacemultibyte-functions

提问by Ian

I'm trying to do accented character replacement in PHP but get funky results, my guess being because i'm using a UTF-8 string and str_replace can't properly handle multi-byte strings..

我正在尝试在 PHP 中进行重音字符替换,但得到的结果很奇怪,我猜是因为我使用的是 UTF-8 字符串,而 str_replace 无法正确处理多字节字符串..

$accents_search     = array('á','à','a','?','a','?','?','á','à','?','?','?','é','è',
'ê','?','é','è','ê','?','í','ì','?','?','í','ì','?','?','?','ò','ó','?','?','o','?',
'?','ó','ò','?','?','ú','ù','?','ú','ù','?','?','?','?','?'); 

$accents_replace    = array('a','a','a','a','a','a','a','A','A','A','A','A','e','e',
'e','e','E','E','E','E','i','i','i','i','I','I','I','I','oe','o','o','o','o','o','o',
'O','O','O','O','O','u','u','u','U','U','U','c','C','N','n'); 

$str = str_replace($accents_search, $accents_replace, $str);

Results I get:

我得到的结果:

?rjan Nilsen -> ?orjan Nilsen

Expected Result:

预期结果:

?rjan Nilsen -> Orjan Nilsen

Edit: I've got my internal character handler set to UTF-8 (according to mb_internal_encoding()), also the value of $str is UTF-8, so from what I can tell, all the strings involved are UTF-8. Does str_replace() detect char sets and use them properly?

编辑:我的内部字符处理程序设置为 UTF-8(根据 mb_internal_encoding()),$str 的值也是 UTF-8,所以据我所知,所有涉及的字符串都是 UTF-8。str_replace() 是否检测字符集并正确使用它们?

采纳答案by phsiao

Looks like the string was not replaced because your input encoding and the file encoding mismatch.

看起来字符串没有被替换,因为您的输入编码和文件编码不匹配。

回答by dav

According to php documentation str_replacefunction is binary-safe, which means that it can handle UTF-8encoded text without any data loss.

根据 php 文档str_replace函数是二进制安全的,这意味着它可以处理UTF-8编码文本而不会丢失任何数据。

回答by mermshaus

It's possible to remove diacritics using Unicode normalization form D(NFD) and Unicode character properties.

可以使用Unicode 规范化形式 D(NFD) 和 Unicode 字符属性删除变音符号。

NFD converts something like the "ü" umlaut from "LATIN SMALL LETTER U WITH DIAERESIS" (which is a letter) to "LATIN SMALL LETTER U" (letter) and "COMBINING DIAERESIS" (not a letter).

NFD 将“ü”变音从“LATIN SMALL LETTER U WITH DIAERESIS”(这是一个字母)转换为“LATIN SMALL LETTER U”(字母)和“COMBINING DIAERESIS”(不是字母)。

header('Content-Type: text/plain; charset=utf-8');

$test = implode('', array('á','à','a','?','a','?','?','á','à','?','?','?','é','è',
'ê','?','é','è','ê','?','í','ì','?','?','í','ì','?','?','?','ò','ó','?','?','o','?',
'?','ó','ò','?','?','ú','ù','?','ú','ù','?','?','?','?','?'));

$test = Normalizer::normalize($test, Normalizer::FORM_D);

// Remove everything that's not a "letter" or a space (e.g. diacritics)
// (see http://de2.php.net/manual/en/regexp.reference.unicode.php)
$pattern = '/[^\pL ]/u';

echo preg_replace($pattern, '', $test);

Output:

输出:

aaaaaaaAAAAAeeeeEEEEiiiiIIII?ooooo??OOOOuuuUUUcCNn

The Normalizer class is part of the PECL intl package. (The algorithm itself isn't very complicated but needs to load a lot of character mappings afaik. I wrote a PHP implementationa while ago.)

Normalizer 类是PECL intl 包的一部分。(算法本身不是很复杂,但需要加载很多字符映射afaik。我前段时间写了一个PHP实现。)

(I'm adding this two months late because I think it's a nice technique that's not known widely enough.)

(我推迟了两个月才添加这个,因为我认为这是一个很好的技术,但还不够广为人知。)

回答by Gumbo

Try this function definition:

试试这个函数定义:

if (!function_exists('mb_str_replace')) {
    function mb_str_replace($search, $replace, $subject) {
        if (is_array($subject)) {
            foreach ($subject as $key => $val) {
                $subject[$key] = mb_str_replace((string)$search, $replace, $subject[$key]);
            }
            return $subject;
        }
        $pattern = '/(?:'.implode('|', array_map(create_function('$match', 'return preg_quote($match[0], "/");'), (array)$search)).')/u';
        if (is_array($search)) {
            if (is_array($replace)) {
                $len = min(count($search), count($replace));
                $table = array_combine(array_slice($search, 0, $len), array_slice($replace, 0, $len));
                $f = create_function('$match', '$table = '.var_export($table, true).'; return array_key_exists($match[0], $table) ? $table[$match[0]] : $match[0];');
                $subject = preg_replace_callback($pattern, $f, $subject);
                return $subject;
            }
        }
        $subject = preg_replace($pattern, (string)$replace, $subject);
        return $subject;
    }
}