PHP:用 UTF-8 字符串中最接近的 7 位 ASCII 等价物替换变音符号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/158241/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 21:43:20  来源:igfitidea点击:

PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

phputf-8diacriticsstrtr

提问by BlaM

What I want to do is to remove all accents and umlauts from a string, turning "l?rm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters.

我想要做的是从字符串中删除所有重音和元音变音,将“l?rm”变成“larm”或“andré”变成“andre”。我试图做的是对字符串进行 utf8_decode,然后在其上使用 strtr,但是由于我的源文件保存为 UTF-8 文件,因此我无法为所有变音符号输入 ISO-8859-15 字符 - 编辑器插入UTF-8 字符。

Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include?

显然,对此的解决方案是拥有一个 ISO-8859-15 文件的包含,但必须有比拥有另一个必需的包含更好的方法?

echo strtr(utf8_decode($input), 
           '???????¥μàá??????èéê?ìí??D?òó????ùú?üY?àáa?????èéê?ìí??e?òó????ùú?üy?',
           'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');

UPDATE:Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" equivalent.

更新:也许我对我尝试做的事情有点不准确:我实际上并不想删除变音符号,而是用它们最接近的“一个字符 ASCII”等价物替换它们。

回答by Vinko Vrsalovic

iconv("utf-8","ascii//TRANSLIT",$input);

Extended example

扩展示例

回答by Alix Axel

A little trick that doesn't require setting locales or having huge translation tables:

一个不需要设置语言环境或拥有庞大翻译表的小技巧:

function Unaccent($string)
{
    if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false)
    {
        $string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '', $string), ENT_QUOTES, 'UTF-8');
    }

    return $string;
}

The only requirement for it to work properly is to save your files in UTF-8 (as you should already).

它正常工作的唯一要求是将您的文件保存为 UTF-8(您应该已经这样做了)。

回答by gabo

you can also try this

你也可以试试这个

$string = "Fó? B?r";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);

but you need to have http://php.net/manual/en/book.intl.phpavailable

但你需要有http://php.net/manual/en/book.intl.php可用

回答by jay

I found that this one gives the most consistent results in French and German. with the meta tag set to utf-8, I have place it in a function to return a line from a array of words and it works perfect.

我发现这个在法语和德语中给出了最一致的结果。将元标记设置为utf-8,我将它放在一个函数中,以从一组单词中返回一行,并且效果很好。

htmlentities (  $line, ENT_SUBSTITUTE   , 'utf-8' ) 

回答by youtag

If you are using WordPress, you can use the built-in function remove_accents( $string )

如果您使用的是 WordPress,则可以使用内置功能 remove_accents( $string )

https://codex.wordpress.org/Function_Reference/remove_accents

https://codex.wordpress.org/Function_Reference/remove_accents

However I noticed a bug : it doesn't work on a string with a single character.

但是我注意到一个错误:它不适用于具有单个字符的字符串。

回答by ganji

For Arabic and Persian users i recommend this way to remove diacritics:

对于阿拉伯语和波斯语用户,我建议通过这种方式删除变音符号:

    $diacritics = array('?','?','?','?','?','?','?','?');
    $search_txt = str_replace($diacritics, '', $diacritics);

For typing diacritics in Arabic keyboards u can use this Asci(those codes are Asci not Unicode) codes in windows editors typing diacritics directly or holding Alt + (type the code of diacritic character) This is the codes

要在阿拉伯语键盘中输入变音符号,您可以在 Windows 编辑器中使用此 Asci(这些代码是 Asci 而非 Unicode)代码直接输入变音符号或按住 Alt +(输入变音符号字符的代码)这是代码

??(0243) ??(0246) ??(0245) ??(0240) ??(0242) ??(0241) ??(0250) ??(0248) ? ?(0220)

??(0243) ??(0246) ??(0245) ??(0240) ??(0242) ??(0241) ??(0250) ??(0248) ? ?(0220)

回答by BlaM

Okay, found an obvious solution myself, but it's not the best concerning performance...

好的,我自己找到了一个明显的解决方案,但这并不是最好的性能......

echo strtr(utf8_decode($input), 
           utf8_decode('???????¥μàá??????èéê?ìí??D?òó????ùú?üY?àáa?????èéê?ìí??e?òó????ùú?üy?'),
           'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');