php 将重音字符转换为它们的纯 ascii 等价物

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10054818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 21:16:27  来源:igfitidea点击:

Convert accented characters to their plain ascii equivalents

phputf-8character-encodinglocalediacritics

提问by ram

I have to convert french characters into english on my php. I've used the following code:

我必须在我的 php 上将法语字符转换为英语。我使用了以下代码:

iconv("utf-8", "ascii//TRANSLIT", $string);

But the result for ???was "E"E"E.

但结果???"E"E"E

I don't need that double quote and other extra characters - I want to show an output like EEE. Is there any other method to convert french to english? Can you help me to do this?

我不需要那个双引号和其他额外的字符 - 我想显示像EEE. 有没有其他方法可以将法语转换为英语?你能帮我做这个吗?

回答by Ing

The PHP Manual iconv Introhas a warning:

PHP手册的iconv介绍有一个警告:

Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.

请注意,某些系统上的 iconv 功能可能无法按预期工作。在这种情况下,最好安装 GNU libiconv 库。它很可能最终会得到更一致的结果。

But if accented characters are the only issue then you could use a dirty strtr (partially from strtr comments):

但是,如果重音字符是唯一的问题,那么您可以使用脏 strtr(部分来自strtr 注释):

$string = '? à ì ? í ? ? ? ? ? ò è ó é ? ê ? ? ê ù ? ú ? ? ? ü ? Y ? a ? ? ? ?';

$normalizeChars = array(
    '?'=>'S', '?'=>'s', 'D'=>'Dj','?'=>'Z', '?'=>'z', 'à'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A',
    '?'=>'A', '?'=>'A', '?'=>'C', 'è'=>'E', 'é'=>'E', 'ê'=>'E', '?'=>'E', 'ì'=>'I', 'í'=>'I', '?'=>'I',
    '?'=>'I', '?'=>'N', '?'=>'N', 'ò'=>'O', 'ó'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', 'ù'=>'U', 'ú'=>'U',
    '?'=>'U', 'ü'=>'U', 'Y'=>'Y', 'T'=>'B', '?'=>'Ss','à'=>'a', 'á'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a',
    '?'=>'a', '?'=>'a', '?'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', '?'=>'e', 'ì'=>'i', 'í'=>'i', '?'=>'i',
    '?'=>'i', 'e'=>'o', '?'=>'n', 'ń'=>'n', 'ò'=>'o', 'ó'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ù'=>'u',
    'ú'=>'u', '?'=>'u', 'ü'=>'u', 'y'=>'y', 'y'=>'y', 't'=>'b', '?'=>'y', '?'=>'f',
    '?'=>'a', '?'=>'i', 'a'=>'a', '?'=>'s', '?'=>'t', '?'=>'A', '?'=>'I', '?'=>'A', '?'=>'S', '?'=>'T',
);

//Output: E A I A I A I A I C O E O E O E O O e U e U i U i U o Y o a u a y c
echo strtr($string, $normalizeChars);

回答by Ayoub

This worked for me for French characters.

这对我来说适用于法语字符。

$str = utf8_encode($str);
$str = iconv('UTF-8', 'ASCII//TRANSLIT', $str);

回答by tong

An alternative:

替代:

function replaceAccents($str) {

  $search = explode(",","?,?,?,á,é,í,ó,ú,à,è,ì,ò,ù,?,?,?,?,ü,?,a,ê,?,?,?,?,?,?,?,á,à,?,?,è,é,ê,?,í,?,?,ì,ò,ó,?,?,ú,ù,?,ü,?,?,?,?");

  $replace = explode(",","c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE");

  return str_replace($search, $replace, $str);
}


$str = "à é ü ? ?";
$str = replaceAccents($str);
echo "$str \n"; 
//output "A e u a c" 

回答by ecabuk

Here is the wordpress way:

这是wordpress的方式:

http://codex.wordpress.org/Function_Reference/remove_accents

http://codex.wordpress.org/Function_Reference/remove_accents

You can copy the remove_accents() function and implement to your system.

您可以复制 remove_accents() 函数并将其实施到您的系统中。

https://core.trac.wordpress.org/browser/tags/3.9.1/src/wp-includes/formatting.php#L682

https://core.trac.wordpress.org/browser/tags/3.9.1/src/wp-includes/formatting.php#L682

回答by Selay

Just putting here as an alternative (a bit more complex in nature), wordpress uses this function for accent removal. Made some changes below to make it run indepndently without referencing wordpress functions.

只是把这里作为替代(本质上有点复杂),wordpress 使用这个功能来去除重音。在下面进行了一些更改,使其在不引用 wordpress 函数的情况下独立运行。

     function mbstring_binary_safe_encoding($reset = false)
{
    static $encodings  = array();
    static $overloaded = null;

    if (is_null($overloaded)) {
        $overloaded = function_exists('mb_internal_encoding') && (ini_get('mbstring.func_overload') & 2);
    }

    if (false === $overloaded) {
        return;
    }

    if (!$reset) {
        $encoding = mb_internal_encoding();
        array_push($encodings, $encoding);
        mb_internal_encoding('ISO-8859-1');
    }

    if ($reset && $encodings) {
        $encoding = array_pop($encodings);
        mb_internal_encoding($encoding);
    }
}

function seems_utf8($str)
{
    mbstring_binary_safe_encoding();
    $length = strlen($str);
    mbstring_binary_safe_encoding(true);
    for ($i = 0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) {
            $n = 0;
        }
        // 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) {
            $n = 1;
        }
        // 110bbbbb
        elseif (($c & 0xF0) == 0xE0) {
            $n = 2;
        }
        // 1110bbbb
        elseif (($c & 0xF8) == 0xF0) {
            $n = 3;
        }
        // 11110bbb
        elseif (($c & 0xFC) == 0xF8) {
            $n = 4;
        }
        // 111110bb
        elseif (($c & 0xFE) == 0xFC) {
            $n = 5;
        }
        // 1111110b
        else {
                return false;
            }
            // Does not match any model
            for ($j = 0; $j < $n; $j++) {
                // n bytes matching 10bbbbbb follow ?
                if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80)) {
                    return false;
                }

            }
        }
        return true;
    }

    function remove_accents($string)
{
        if (!preg_match('/[\x80-\xff]/', $string)) {
            return $string;
        }

        if (seems_utf8($string)) {
            $chars = array(
                // Decompositions for Latin-1 Supplement
                'a' => 'a', 'o'  => 'o',
                'à' => 'A', 'á'  => 'A',
                '?' => 'A', '?'  => 'A',
                '?' => 'A', '?'  => 'A',
                '?' => 'AE', '?' => 'C',
                'è' => 'E', 'é'  => 'E',
                'ê' => 'E', '?'  => 'E',
                'ì' => 'I', 'í'  => 'I',
                '?' => 'I', '?'  => 'I',
                'D' => 'D', '?'  => 'N',
                'ò' => 'O', 'ó'  => 'O',
                '?' => 'O', '?'  => 'O',
                '?' => 'O', 'ù'  => 'U',
                'ú' => 'U', '?'  => 'U',
                'ü' => 'U', 'Y'  => 'Y',
                'T' => 'TH', '?' => 's',
                'à' => 'a', 'á'  => 'a',
                'a' => 'a', '?'  => 'a',
                '?' => 'a', '?'  => 'a',
                '?' => 'ae', '?' => 'c',
                'è' => 'e', 'é'  => 'e',
                'ê' => 'e', '?'  => 'e',
                'ì' => 'i', 'í'  => 'i',
                '?' => 'i', '?'  => 'i',
                'e' => 'd', '?'  => 'n',
                'ò' => 'o', 'ó'  => 'o',
                '?' => 'o', '?'  => 'o',
                '?' => 'o', '?'  => 'o',
                'ù' => 'u', 'ú'  => 'u',
                '?' => 'u', 'ü'  => 'u',
                'y' => 'y', 't'  => 'th',
                '?' => 'y', '?'  => 'O',
                // Decompositions for Latin Extended-A
                'ā' => 'A', 'ā'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'C', '?'  => 'c',
                '?' => 'C', '?'  => 'c',
                '?' => 'C', '?'  => 'c',
                '?' => 'C', '?'  => 'c',
                '?' => 'D', '?'  => 'd',
                '?' => 'D', '?'  => 'd',
                'ē' => 'E', 'ē'  => 'e',
                '?' => 'E', '?'  => 'e',
                '?' => 'E', '?'  => 'e',
                '?' => 'E', '?'  => 'e',
                'ě' => 'E', 'ě'  => 'e',
                '?' => 'G', '?'  => 'g',
                '?' => 'G', '?'  => 'g',
                '?' => 'G', '?'  => 'g',
                '?' => 'G', '?'  => 'g',
                '?' => 'H', '?'  => 'h',
                '?' => 'H', '?'  => 'h',
                '?' => 'I', '?'  => 'i',
                'ī' => 'I', 'ī'  => 'i',
                '?' => 'I', '?'  => 'i',
                '?' => 'I', '?'  => 'i',
                '?' => 'I', '?'  => 'i',
                '?' => 'IJ', '?' => 'ij',
                '?' => 'J', '?'  => 'j',
                '?' => 'K', '?'  => 'k',
                '?' => 'k', '?'  => 'L',
                '?' => 'l', '?'  => 'L',
                '?' => 'l', '?'  => 'L',
                '?' => 'l', '?'  => 'L',
                '?' => 'l', '?'  => 'L',
                '?' => 'l', '?'  => 'N',
                'ń' => 'n', '?'  => 'N',
                '?' => 'n', '?'  => 'N',
                'ň' => 'n', '?'  => 'n',
                '?' => 'N', '?'  => 'n',
                'ō' => 'O', 'ō'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'OE', '?' => 'oe',
                '?' => 'R', '?'  => 'r',
                '?' => 'R', '?'  => 'r',
                '?' => 'R', '?'  => 'r',
                '?' => 'S', '?'  => 's',
                '?' => 'S', '?'  => 's',
                '?' => 'S', '?'  => 's',
                '?' => 'S', '?'  => 's',
                '?' => 'T', '?'  => 't',
                '?' => 'T', '?'  => 't',
                '?' => 'T', '?'  => 't',
                '?' => 'U', '?'  => 'u',
                'ū' => 'U', 'ū'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'W', '?'  => 'w',
                '?' => 'Y', '?'  => 'y',
                '?' => 'Y', '?'  => 'Z',
                '?' => 'z', '?'  => 'Z',
                '?' => 'z', '?'  => 'Z',
                '?' => 'z', '?'  => 's',
                // Decompositions for Latin Extended-B
                '?' => 'S', '?'  => 's',
                '?' => 'T', '?'  => 't',
                // Euro Sign
                '' => 'E',
                // GBP (Pound) Sign
                '£' => '',
                // Vowels with diacritic (Vietnamese)
                // unmarked
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                // grave accent
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'E', '?'  => 'e',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                '?' => 'Y', '?'  => 'y',
                // hook
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'E', '?'  => 'e',
                '?' => 'E', '?'  => 'e',
                '?' => 'I', '?'  => 'i',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'Y', '?'  => 'y',
                // tilde
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'E', '?'  => 'e',
                '?' => 'E', '?'  => 'e',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                '?' => 'Y', '?'  => 'y',
                // acute accent
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'E', '?'  => 'e',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                // dot below
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'A', '?'  => 'a',
                '?' => 'E', '?'  => 'e',
                '?' => 'E', '?'  => 'e',
                '?' => 'I', '?'  => 'i',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'O', '?'  => 'o',
                '?' => 'U', '?'  => 'u',
                '?' => 'U', '?'  => 'u',
                '?' => 'Y', '?'  => 'y',
                // Vowels with diacritic (Chinese, Hanyu Pinyin)
                'ɑ' => 'a',
                // macron
                'ǖ' => 'U', 'ǖ'  => 'u',
                // acute accent
                'ǘ' => 'U', 'ǘ'  => 'u',
                // caron
                'ǎ' => 'A', 'ǎ'  => 'a',
                'ǐ' => 'I', 'ǐ'  => 'i',
                'ǒ' => 'O', 'ǒ'  => 'o',
                'ǔ' => 'U', 'ǔ'  => 'u',
                'ǚ' => 'U', 'ǚ'  => 'u',
                // grave accent
                'ǜ' => 'U', 'ǜ'  => 'u',
            );

            $string = strtr($string, $chars);
        } else {
            $chars = array();
            // Assume ISO-8859-1 if not UTF-8
            $chars['in'] = "\x80\x83\x8a\x8e\x9a\x9e"
                . "\x9f\xa2\xa5\xb5\xc0\xc1\xc2"
                . "\xc3\xc4\xc5\xc7\xc8\xc9\xca"
                . "\xcb\xcc\xcd\xce\xcf\xd1\xd2"
                . "\xd3\xd4\xd5\xd6\xd8\xd9\xda"
                . "\xdb\xdc\xdd\xe0\xe1\xe2\xe3"
                . "\xe4\xe5\xe7\xe8\xe9\xea\xeb"
                . "\xec\xed\xee\xef\xf1\xf2\xf3"
                . "\xf4\xf5\xf6\xf8\xf9\xfa\xfb"
                . "\xfc\xfd\xff";

            $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";

            $string              = strtr($string, $chars['in'], $chars['out']);
            $double_chars        = array();
            $double_chars['in']  = array("\x8c", "\x9c", "\xc6", "\xd0", "\xde", "\xdf", "\xe6", "\xf0", "\xfe");
            $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
            $string              = str_replace($double_chars['in'], $double_chars['out'], $string);
        }

        return $string;
    }

回答by Kaloy

What worked for me in this instance is using utf8_encode on the string and then using strtr() function after encoding it.

在这种情况下对我有用的是在字符串上使用 utf8_encode ,然后在对其进行编码后使用 strtr() 函数。

ie.

IE。

$normalizeChars = array(
            '?'=>'S', '?'=>'s', 'D'=>'Dj','?'=>'Z', '?'=>'z', 'à'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A',
            '?'=>'A', '?'=>'A', '?'=>'C', 'è'=>'E', 'é'=>'E', 'ê'=>'E', '?'=>'E', 'ì'=>'I', 'í'=>'I', '?'=>'I',
            '?'=>'I', '?'=>'N', '?'=>'N', 'ò'=>'O', 'ó'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', 'ù'=>'U', 'ú'=>'U',
            '?'=>'U', 'ü'=>'U', 'Y'=>'Y', 'T'=>'B', '?'=>'Ss','à'=>'a', 'á'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a',
            '?'=>'a', '?'=>'a', '?'=>'c', 'è'=>'e', 'é'=>'e', 'ê'=>'e', '?'=>'e', 'ì'=>'i', 'í'=>'i', '?'=>'i',
            '?'=>'i', 'e'=>'o', '?'=>'n', 'ń'=>'n', 'ò'=>'o', 'ó'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ù'=>'u',
            'ú'=>'u', '?'=>'u', 'ü'=>'u', 'y'=>'y', 'y'=>'y', 't'=>'b', '?'=>'y', '?'=>'f',
            '?'=>'a', '?'=>'i', 'a'=>'a', '?'=>'s', '?'=>'t', '?'=>'A', '?'=>'I', '?'=>'A', '?'=>'S', '?'=>'T',
        );

$legal_name = utf8_encode($res['LEGAL_NAME']);
$res['LEGAL_NAME'] = strtr($legal_name, $normalizeChars);

outputs = Burelle Amélie -> Burelle Amelie

输出 = Burelle Amélie -> Burelle Amelie

回答by aPa

In laravel you can simply use str_slug($accentedPhrase)and if you care about dash (-) that this method substitute with space you can use str_replace('-', ' ', str_slug($accentedPhrase))

在laravel中,您可以简单地使用str_slug($accentedPhrase),如果您关心破折号(-),则此方法可以用空格代替str_replace('-', ' ', str_slug($accentedPhrase))

回答by Tom

Here's what helped me. It's important to set locale to en_US and not your real locale.

这就是帮助我的东西。将语言环境设置为 en_US 而不是您的真实语言环境很重要。

$curLocale = setlocale(LC_ALL, 0); //gets current locale
setlocale(LC_ALL, "en_US.utf8"); //without this iconv removes accented letters. If you use another locale it will also fail

$text = iconv('UTF-8', 'ASCII//TRANSLIT', $text);

setlocale(LC_ALL, $curLocale); //set locale to what it was before