php 将任何可转换的 utf8 字符音译为等效的 ascii

Question

提问by Ivan Hu?njak

Is there any good solution out there that does this transliteration in a good manner?

有没有什么好的解决方案可以很好地进行这种音译？

I've tried using iconv(), but is very annoying and it does not behave as one might expect.

我试过使用iconv()，但很烦人，而且它的行为不像人们预期的那样。

Using //TRANSLITwill try to replace what it can, leaving everything nonconvertible as "?"
Using //IGNOREwill not leave "?" in text, but will also not transliterate and will also raise E_NOTICEwhen nonconvertible char is found, so you have to use iconv with @ error suppressor
Using //IGNORE//TRANSLIT(as some people suggested in PHP forum) is actually same as //IGNORE(tried it myself on php versions 5.3.2 and 5.3.13)
Also using //TRANSLIT//IGNOREis same as //TRANSLIT

Using//TRANSLIT将尝试替换它可以替换的内容，将所有不可转换的内容保留为“？”
使用//IGNORE不会留下“？” 在文本中，但也不会音译，并且E_NOTICE在找到不可转换的字符时也会引发，因此您必须将 iconv 与 @ 错误抑制器一起使用
使用//IGNORE//TRANSLIT（正如某些人在 PHP 论坛中建议的那样）实际上与//IGNORE（在 php 版本 5.3.2 和 5.3.13 上自己尝试过）相同
也使用//TRANSLIT//IGNORE相同//TRANSLIT

It also uses current locale settings to transliterate.

它还使用当前的语言环境设置进行音译。

WARNING - a lot of text and code is following!

警告 - 大量文本和代码如下！

Here are some examples:

这里有些例子：

$text = 'Regular ascii text + ????? + ??ü? + é?ě??? + ?? + $ + ? + @';
echo '<br />original: ' . $text;
echo '<br />regular: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> regular: Regular ascii text + ????? + ???ss + ?????? + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'en_GB');
echo '<br />en_GB: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'en_GB.UTF8'); // will this work?
echo '<br />en_GB.UTF8: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> en_GB.UTF8: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

Ok, that did convert ? ? ? ? ? ü ? é ? ě ? ? ? and ?, but why not ? and ??

好的，那确实转换了？? ? ? ? ü ? é ? ? ? ? 和？，但为什么不呢？和？？

// now specific locales
setlocale(LC_ALL, 'hr_Hr'); // this should fix croatian ?, right?
echo '<br />hr_Hr: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// wrong > hr_Hr: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

setlocale(LC_ALL, 'sv_SE'); // so this will fix swedish ??
echo '<br />sv_SE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
// will not > sv_SE: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

//this is interesting
setlocale(LC_ALL, 'de_DE');
echo '<br />de_DE: ' . iconv("UTF-8", "ASCII//TRANSLIT", $text);
//> de_DE: Regular ascii text + cczs? + aeoeuess + eeeeee + ae?EUR + $ + ? + @
// actually this is what any german would expect since ? ? ü really is same as ae oe ue

Lets try with //IGNORE:

让我们尝试//IGNORE：

echo '<br />ignore: ' . iconv("UTF-8", "ASCII//IGNORE", $text);
//> ignore: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 49"

// with translit?
echo '<br />ignore/translit: ' . iconv("UTF-8", "ASCII//IGNORE//TRANSLIT", $text);
//same as ignore only> ignore/translit: Regular ascii text + + + + + $ + + @
//+ E_NOTICE: "Notice: iconv(): Detected an illegal character in input string in /var/www/test.server.web/index.php on line 54"

// translit/ignore?
echo '<br />translit/ignore: ' . iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text);
//same as translit only> translit/ignore: Regular ascii text + cczs? + aouss + eeeeee + ae?EUR + $ + ? + @

Using solution of this guyalso does not work as wanted: Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + ? + @

使用此人的解决方案也无法正常工作：Regular ascii text + YYYYY + aous + eYYYeY + aoY + $ + ? + @

Even using PECL intl Normalizerclass (which is not awailable always even if you have PHP > 5.3.0, since ICU package intl uses may not be available to PHP i.e. on certain hosting servers) produces wrong result:

即使使用 PECL intl Normalizer类（即使您的 PHP > 5.3.0 也不总是可用的，因为 ICU 包 intl 使用可能不适用于 PHP，即在某些托管服务器上）会产生错误的结果：

echo '<br />normalize: ' .preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD));
//>normalize: Regular ascii text + cczs? + aou? + eeeeee + ?? + $ + ? + @

So is there any other way of doing this right or the only proper thing to do is to do preg_replace()or str_replace()and define transliteration tables yourself?

那么有没有其他方法可以正确地做到这一点，或者唯一正确的做法是自己做preg_replace()或str_replace()定义音译表？

// appendix: I have found on ZF wiki debate from 2008 about proposal for Zend_Filter_Transliteratebut project was dropped since in some languages it is not possible to convert (i.e. chinese), but still for any latin- and cyrilic-based language IMO this option should exist.

// 附录：我在 2008 年的 ZF wiki 辩论中发现了关于Zend_Filter_Transliterate 的提案，但由于在某些语言中无法转换（即中文），因此项目被放弃，但仍然适用于任何基于拉丁语和西里尔语的语言 IMO 此选项应该存在。

Answer 1

采纳答案by Nicolas Grekas

The toAscii() function of Patchwork\Utf8 does exactly this, see:

Patchwork\Utf8 的 toAscii() 函数正是这样做的，参见：

https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/src/Patchwork/Utf8.php

It leverages iconv and intl's Normalizer to remove accents, split ligatures and do many other generic transliterations.

它利用 iconv 和 intl 的 Normalizer 来删除重音、拆分连字并执行许多其他通用音译。

Answer 2

回答by Alain Tiemblo

From this website, I found something that might help you :

从这个网站，我发现了一些可能对你有帮助的东西：

function removeAccents($str)
{
  $a = array('à', 'á', '?', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', 'D', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'Y', '?', 'à', 'á', 'a', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'y', '?', 'ā', 'ā', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ē', 'ē', '?', '?', '?', '?', '?', '?', 'ě', 'ě', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ī', 'ī', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ń', '?', '?', '?', 'ň', '?', 'ō', 'ō', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ū', 'ū', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ǎ', 'ǎ', 'ǐ', 'ǐ', 'ǒ', 'ǒ', 'ǔ', 'ǔ', 'ǖ', 'ǖ', 'ǘ', 'ǘ', 'ǚ', 'ǚ', 'ǜ', 'ǜ', '?', '?', '?', '?', '?', '?');
  $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
  return str_replace($a, $b, $str);
}

Usage example :

用法示例：

$text = 'Regular ascii text + ????? + ??ü? + é?ě??? + ?? + $ + ? + @';
echo removeAccents($text);

Displays :

显示：

Regular ascii text + cczsd + aous + eeeee? + aeo + $ + ? + @

You'll need to improve it, but you get the idea... If there is a direct way to do such a work, I don't know it.

你需要改进它，但你明白了......如果有直接的方法来做这样的工作，我不知道。

Answer 3

回答by user3914203

As none of the solutions above worked for me (I needed to transliterate many European character sets to ASCII), I finally found this old PECL package which just seemed to work http://derickrethans.nl/projects.html#translit. I had problems especially with cyrillic character sets, and this seems to handle them perfectly.

由于上述解决方案都不适合我（我需要将许多欧洲字符集音译为 ASCII），我终于找到了这个旧的 PECL 包，它似乎可以正常工作http://derickrethans.nl/projects.html#translit。我遇到了问题，尤其是西里尔字符集，这似乎可以完美地处理它们。

Answer 4

回答by Luke Madhanga

If I have understood you correctly, I may have an answer for you: I've written a basic PHP class that allows you to convert most characters into their ASCII equivalents.

如果我理解正确的话，我可能会给你一个答案：我编写了一个基本的 PHP 类，它允许您将大多数字符转换为它们的 ASCII 等价物。

Below is a screenshot of its output converting various composer names with accents in their name.

下面是其输出的屏幕截图，将各种作曲家名称转换为名称中的重音符号。

You can fork it from github here https://github.com/LukeMadhanga/transliterator.

你可以在这里从 github 分叉它https://github.com/LukeMadhanga/transliterator。

NB: It is as of yet undocumented but it should be p*** easy to get to grips with.

注意：它尚未记录在案，但应该很容易掌握。

Answer 5

回答by Alex

I think setting the right locale is the way to go. Be aware, that the specific locale must also be available on the system, check it using locale -a. If you only have de_DE.utf8- also you have to use set_locale(de_DE.utf8)

我认为设置正确的语言环境是要走的路。请注意，特定区域设置也必须在系统上可用，请使用locale -a. 如果你只有de_DE.utf8- 你也必须使用 set_locale( de_DE.utf8)

php 将任何可转换的 utf8 字符音译为等效的 ascii

提问by Ivan Hu?njak

采纳答案by Nicolas Grekas

回答by Alain Tiemblo

回答by user3914203

回答by Luke Madhanga

回答by Alex

相关推荐

最近更新

标签

php 将任何可转换的 utf8 字符音译为等效的 ascii

提问by Ivan Hu?njak

采纳答案by Nicolas Grekas

回答by Alain Tiemblo

回答by user3914203

回答by Luke Madhanga

回答by Alex

相关推荐

php MongoDB 和 CodeIgniter

PHP：我可以在接口中使用字段吗？

如何使用 php 从 HTML 创建 pdf 文件，然后将其保存在服务器上

php 检查包含（或要求）是否存在

相关推荐

最近更新

标签