php 替换重音字符php
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3371697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replacing accented characters php
提问by Lizard
I am trying to replace accented characters with the normal replacements. Below is what I am currently doing.
我正在尝试用正常替换替换重音字符。以下是我目前正在做的事情。
$string = "éric Cantona";
$strict = strtolower($string);
echo "After Lower: ".$strict;
$patterns[0] = '/[á|a|à|?|?]/';
$patterns[1] = '/[e|é|ê|è|?]/';
$patterns[2] = '/[í|?|ì|?]/';
$patterns[3] = '/[ó|?|ò|?|?|?]/';
$patterns[4] = '/[ú|?|ù|ü]/';
$patterns[5] = '/?/';
$patterns[6] = '/?/';
$patterns[7] = '/?/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';
$strict = preg_replace($patterns, $replacements, $strict);
echo "Final: ".$strict;
This gives me:
这给了我:
After Lower: éric cantona
Final: ric cantona
The above gives me ric cantonaI want the output to be eric cantona.
以上给了我ric cantona我想要的输出是eric cantona.
can anyone help me with where I am going wrong?
谁能帮我解决我哪里出错了?
回答by Lizard
I have tried all sorts based on the variations listed in the answers, but the following worked:
我已经根据答案中列出的变体尝试了各种方法,但以下方法有效:
$unwanted_array = array( '?'=>'S', '?'=>'s', '?'=>'Z', '?'=>'z', 'à'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'C', 'è'=>'E', 'é'=>'E',
'ê'=>'E', '?'=>'E', 'ì'=>'I', 'í'=>'I', '?'=>'I', '?'=>'I', '?'=>'N', 'ò'=>'O', 'ó'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', '?'=>'O', 'ù'=>'U',
'ú'=>'U', '?'=>'U', 'ü'=>'U', 'Y'=>'Y', 'T'=>'B', '?'=>'Ss', 'à'=>'a', 'á'=>'a', 'a'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'a', '?'=>'c',
'è'=>'e', 'é'=>'e', 'ê'=>'e', '?'=>'e', 'ì'=>'i', 'í'=>'i', '?'=>'i', '?'=>'i', 'e'=>'o', '?'=>'n', 'ò'=>'o', 'ó'=>'o', '?'=>'o', '?'=>'o',
'?'=>'o', '?'=>'o', 'ù'=>'u', 'ú'=>'u', '?'=>'u', 'y'=>'y', 't'=>'b', '?'=>'y' );
$str = strtr( $str, $unwanted_array );
回答by mvds
To remove the diacritics, use iconv:
要删除变音符号,请使用 iconv:
$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);
or
或者
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
note that php has some weird bug in that it (sometimes?) needs to have a locale set to make these conversions work, using setlocale().
请注意,php 有一些奇怪的错误,因为它(有时?)需要使用 setlocale() 设置语言环境才能使这些转换工作。
edittested, it gets all of your diacritics out of the box:
编辑测试,它让你所有的变音符号开箱即用:
$val = "á|a|à|?|? e|é|ê|è|? í|?|ì|? ó|?|ò|?|?|? ú|?|ù|ü ? ? ? abc ABC 123";
echo iconv('UTF-8','ASCII//TRANSLIT',$val);
output (updated 2019-12-30)
输出(更新于 2019-12-30)
a|a|a|a|a d|e|e|e|e i|i|i|i o|o|o|o|o|o u|u|u|u ae c ss abc ABC 123
Note that eis correctly transliterated to dinstead of o, as in the accepted answer.
请注意,正如在接受的答案中一样,e它正确地音译为d而不是o。
回答by BurninLeo
I just came accross the answer from Lizard which is extremely helpful - especially when you do some sorting. Isn't is beautiful how many chars we need to say mostly the same ;)
我刚刚从 Lizard 那里得到了一个非常有帮助的答案——尤其是当你进行一些排序时。我们需要说多少个字符基本相同,这不是很漂亮吗;)
If anyone else if looking for a all-in solution (as far as the comments above tell), here's the copy&paste:
如果其他人正在寻找全能解决方案(就上面的评论而言),这里是复制和粘贴:
/**
* Replace language-specific characters by ASCII-equivalents.
* @param string $s
* @return string
*/
public static function normalizeChars($s) {
$replace = array(
'ъ'=>'-', 'Ь'=>'-', 'Ъ'=>'-', 'ь'=>'-',
'?'=>'A', '?'=>'A', 'à'=>'A', '?'=>'A', 'á'=>'A', '?'=>'A', '?'=>'A', '?'=>'A', '?'=>'Ae',
'T'=>'B',
'?'=>'C', '?'=>'C', '?'=>'C',
'è'=>'E', '?'=>'E', 'é'=>'E', '?'=>'E', 'ê'=>'E',
'?'=>'G',
'?'=>'I', '?'=>'I', '?'=>'I', 'í'=>'I', 'ì'=>'I',
'?'=>'L',
'?'=>'N', '?'=>'N',
'?'=>'O', 'ó'=>'O', 'ò'=>'O', '?'=>'O', '?'=>'O', '?'=>'Oe',
'?'=>'S', '?'=>'S', '?'=>'S', '?'=>'S',
'?'=>'T',
'ù'=>'U', '?'=>'U', 'ú'=>'U', 'ü'=>'Ue',
'Y'=>'Y',
'?'=>'Z', '?'=>'Z', '?'=>'Z',
'a'=>'a', 'ǎ'=>'a', '?'=>'a', 'á'=>'a', '?'=>'a', '?'=>'a', 'ǎ'=>'a', 'а'=>'a', 'А'=>'a', '?'=>'a', 'à'=>'a', '?'=>'a', '?'=>'a', 'ā'=>'a', '?'=>'a', 'ā'=>'a', '?'=>'ae', '?'=>'ae', '?'=>'ae', '?'=>'ae',
'б'=>'b', '?'=>'b', 'Б'=>'b', 't'=>'b',
'?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', '?'=>'c', 'ц'=>'c', '?'=>'c', '?'=>'c', 'Ц'=>'c', '?'=>'c', '?'=>'c', 'Ч'=>'ch', 'ч'=>'ch',
'?'=>'d', '?'=>'d', '?'=>'d', '?'=>'d', '?'=>'d', 'д'=>'d', 'Д'=>'D', 'e'=>'d',
'?'=>'e', '?'=>'e', 'е'=>'e', 'Е'=>'e', '?'=>'e', '?'=>'e', '?'=>'e', 'ē'=>'e', 'ē'=>'e', '?'=>'e', '?'=>'e', 'ě'=>'e', 'ě'=>'e', '?'=>'e', '?'=>'e', 'ê'=>'e', '?'=>'e', 'è'=>'e', '?'=>'e', 'é'=>'e',
'ф'=>'f', '?'=>'f', 'Ф'=>'f',
'?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', 'Г'=>'g', 'г'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g', '?'=>'g',
'?'=>'h', '?'=>'h', 'Х'=>'h', '?'=>'h', '?'=>'h', '?'=>'h', 'х'=>'h', '?'=>'h',
'?'=>'i', '?'=>'i', 'í'=>'i', 'ì'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', 'И'=>'i', '?'=>'i', 'ǐ'=>'i', '?'=>'i', 'ǐ'=>'i', 'и'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', 'ī'=>'i', '?'=>'i', '?'=>'i', '?'=>'i', 'ī'=>'i', '?'=>'ij', '?'=>'ij',
'й'=>'j', 'Й'=>'j', '?'=>'j', '?'=>'j', 'я'=>'ja', 'Я'=>'ja', 'Э'=>'je', 'э'=>'je', 'ё'=>'jo', 'Ё'=>'jo', 'ю'=>'ju', 'Ю'=>'ju',
'?'=>'k', '?'=>'k', '?'=>'k', 'К'=>'k', 'к'=>'k', '?'=>'k', '?'=>'k',
'?'=>'l', '?'=>'l', 'Л'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', '?'=>'l', 'л'=>'l', '?'=>'l', '?'=>'l', '?'=>'l',
'?'=>'m', 'М'=>'m', '?'=>'m', 'м'=>'m',
'?'=>'n', 'н'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', 'Н'=>'n', 'ń'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', '?'=>'n', 'ň'=>'n',
'о'=>'o', 'О'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', '?'=>'o', 'ō'=>'o', 'ō'=>'o', '?'=>'o', '?'=>'o', 'ǒ'=>'o', 'ò'=>'o', '?'=>'o', 'ǒ'=>'o', '?'=>'o', 'ó'=>'o', '?'=>'o', '?'=>'oe', '?'=>'oe', '?'=>'oe',
'?'=>'p', '?'=>'p', 'п'=>'p', 'П'=>'p',
'?'=>'q',
'?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', '?'=>'r', 'Р'=>'r', 'р'=>'r',
'?'=>'s', 'с'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', '?'=>'s', 'С'=>'s', '?'=>'s', 'Щ'=>'sch', 'щ'=>'sch', 'ш'=>'sh', 'Ш'=>'sh', '?'=>'ss',
'т'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', 'Т'=>'t', '?'=>'t', '?'=>'t', '?'=>'t', '?'=>'tm',
'ū'=>'u', 'у'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', 'ū'=>'u', 'ǔ'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', '?'=>'u', 'ǖ'=>'u', 'ǔ'=>'u', 'ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', '?'=>'u', 'У'=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'ǚ'=>'u', 'ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue',
'в'=>'v', '?'=>'v', 'В'=>'v',
'?'=>'w', '?'=>'w', '?'=>'w',
'ы'=>'y', '?'=>'y', 'y'=>'y', '?'=>'y', '?'=>'y', '?'=>'y',
'Ы'=>'y', '?'=>'z', 'З'=>'z', 'з'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', '?'=>'z', 'Ж'=>'zh', 'ж'=>'zh'
);
return strtr($s, $replace);
}
Note some slight changes regarding the German umlauts (? => ae)
请注意有关德国元音变音 (? => ae) 的一些细微变化
Edit:Included more characters based on the posting from user3682119 (except for the copyright symbol) and the comment from daker.
编辑:根据 user3682119 的帖子(版权符号除外)和 daker 的评论包含更多字符。
回答by ItalyPaleAle
In PHP 5.4 the intlextension provides a new class named Transliterator.
在 PHP 5.4 中,intl扩展提供了一个名为 Transliterator 的新类。
I believe that's the best way to remove diacritics for two reasons:
我认为这是删除变音符号的最佳方法,原因有两个:
Transliterator is based on ICU, so you're using the tables of the ICU library. ICU is a great project, developed over the year to provide comprehensive tables and functionalities. Whatever table you want to write yourself, it will never be as complete as the one from ICU.
In UTF-8, characters could be represented differently. For example, the character ? could be saved as a single (multi-byte) character, or as the combination of characters
?(multibyte) andn. In addition to this, some characters in Unicode are homograph: they look the same while having different codepoints. For this reason it's also important to normalize the string.
Transliterator 基于 ICU,因此您使用的是 ICU 库的表。ICU 是一个伟大的项目,经过一年的开发,提供了全面的表格和功能。无论你想自己写什么表,它都不会像ICU的那张表那样完整。
在 UTF-8 中,字符可以用不同的方式表示。例如,字符 ? 可以保存为单个(多字节)字符,也可以保存为字符
?(多字节)和n. 除此之外,Unicode 中的一些字符是同形异义词:它们看起来相同,但具有不同的代码点。出于这个原因,规范化字符串也很重要。
Here's a sample code, taken from an old answer of mine:
这是一个示例代码,取自我的旧答案:
<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '', 'àòùìé?ü', 'àòùìé?ü', 'ti?sto'];
foreach($test as $e) {
$normalized = $transliterator->transliterate($e);
echo $e. ' --> '.$normalized."\n";
}
?>
Result:
结果:
abcd --> abcd
èe --> ee
-->
àòùìé?ü --> aouieeu
àòùìé?ü --> aouieeu
ti?sto --> tiesto
The first argument for the Transliterator class performs the removal of diacritics as well as the normalization of the string.
Transliterator 类的第一个参数执行去除变音符号以及字符串的规范化。
回答by Kasey Thomas
So I found this on php.net page for preg_replace function
所以我在 php.net 页面上找到了这个 preg_replace 函数
// replace accented chars
$string = "Zacarías Ferreíra"; // my definition for string variable
$accents = '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/';
$string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8');
$string = preg_replace($accents,'',$string_encoded);
If you have encoding issues you may get someting like this "Zacar???-as Ferre???-ra", just decode the string and use said code above
如果你有编码问题,你可能会得到这样的“Zacar???-as Ferre???-ra”,只需解码字符串并使用上述代码
$string = utf8_decode("Zacar???-as Ferre???-ra");
回答by Stergios Zg.
This worked for me:
这对我有用:
<?php
setlocale(LC_ALL, "en_US.utf8");
$val = iconv('UTF-8','ASCII//TRANSLIT',$val);
?>
回答by Kwaadpepper
An updated answer based on @BurninLeo's answer
基于@BurninLeo的答案的更新答案
function replace_spec_char($subject) {
$char_map = array(
"ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-",
"А" => "A", "?" => "A", "ǎ" => "A", "?" => "A", "à" => "A", "?" => "A", "á" => "A", "?" => "A", "?" => "A", "?" => "A", "?" => "A", "ā" => "A", "?" => "A",
"Б" => "B", "?" => "B", "T" => "B",
"?" => "C", "?" => "C", "?" => "C", "Ц" => "C", "?" => "C", "?" => "C", "?" => "C", "?" => "C", "?" => "C",
"Д" => "D", "?" => "D", "?" => "D", "?" => "D", "D" => "D",
"è" => "E", "?" => "E", "é" => "E", "?" => "E", "ê" => "E", "Е" => "E", "ē" => "E", "?" => "E", "ě" => "E", "?" => "E", "?" => "E", "?" => "E", "?" => "E",
"Ф" => "F", "?" => "F",
"?" => "G", "?" => "G", "?" => "G", "?" => "G", "Г" => "G", "?" => "G", "?" => "G",
"?" => "H", "?" => "H", "Х" => "H", "?" => "H", "?" => "H",
"I" => "I", "?" => "I", "?" => "I", "í" => "I", "ì" => "I", "?" => "I", "?" => "I", "I" => "I", "И" => "I", "?" => "I", "ǐ" => "I", "?" => "I", "?" => "I", "ī" => "I", "?" => "I",
"Й" => "J", "?" => "J",
"?" => "K", "?" => "K", "?" => "K", "К" => "K", "?" => "K",
"?" => "L", "?" => "L", "Л" => "L", "?" => "L", "?" => "L", "?" => "L", "?" => "L",
"?" => "M", "М" => "M", "?" => "M",
"?" => "N", "?" => "N", "Н" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N", "?" => "N",
"?" => "O", "ó" => "O", "ò" => "O", "?" => "O", "?" => "O", "О" => "O", "?" => "O", "?" => "O", "ō" => "O", "?" => "O", "ǒ" => "O", "?" => "O",
"?" => "P", "?" => "P", "П" => "P",
"?" => "Q",
"?" => "R", "?" => "R", "?" => "R", "?" => "R", "Р" => "R", "?" => "R",
"?" => "S", "?" => "S", "?" => "S", "?" => "S", "С" => "S", "?" => "S", "?" => "S",
"Т" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T", "?" => "T",
"ù" => "U", "?" => "U", "ú" => "U", "ū" => "U", "У" => "U", "?" => "U", "?" => "U", "ǔ" => "U", "?" => "U", "?" => "U", "?" => "U", "?" => "U", "ǖ" => "U", "ǜ" => "U", "ǚ" => "U", "ǘ" => "U",
"В" => "V", "?" => "V",
"Y" => "Y", "Ы" => "Y", "?" => "Y", "?" => "Y",
"?" => "Z", "?" => "Z", "?" => "Z", "З" => "Z", "?" => "Z",
"а" => "a", "?" => "a", "ǎ" => "a", "?" => "a", "à" => "a", "?" => "a", "á" => "a", "?" => "a", "a" => "a", "?" => "a", "?" => "a", "ā" => "a", "?" => "a",
"б" => "b", "?" => "b", "t" => "b",
"?" => "c", "?" => "c", "?" => "c", "ц" => "c", "?" => "c", "?" => "c", "?" => "c", "?" => "c", "?" => "c",
"Ч" => "ch", "ч" => "ch",
"д" => "d", "?" => "d", "?" => "d", "?" => "d", "e" => "d",
"è" => "e", "?" => "e", "é" => "e", "?" => "e", "ê" => "e", "е" => "e", "ē" => "e", "?" => "e", "ě" => "e", "?" => "e", "?" => "e", "?" => "e", "?" => "e",
"ф" => "f", "?" => "f",
"?" => "g", "?" => "g", "?" => "g", "?" => "g", "г" => "g", "?" => "g", "?" => "g",
"?" => "h", "?" => "h", "х" => "h", "?" => "h", "?" => "h",
"i" => "i", "?" => "i", "?" => "i", "í" => "i", "ì" => "i", "?" => "i", "?" => "i", "?" => "i", "и" => "i", "?" => "i", "ǐ" => "i", "?" => "i", "?" => "i", "ī" => "i", "?" => "i",
"й" => "j", "Й" => "j", "?" => "j", "?" => "j",
"?" => "k", "?" => "k", "?" => "k", "к" => "k", "?" => "k",
"?" => "l", "?" => "l", "л" => "l", "?" => "l", "?" => "l", "?" => "l", "?" => "l",
"?" => "m", "м" => "m", "?" => "m",
"?" => "n", "ń" => "n", "н" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "?" => "n", "ň" => "n",
"?" => "o", "ó" => "o", "ò" => "o", "?" => "o", "?" => "o", "о" => "o", "?" => "o", "?" => "o", "ō" => "o", "?" => "o", "ǒ" => "o", "?" => "o",
"?" => "p", "?" => "p", "п" => "p",
"?" => "q",
"?" => "r", "?" => "r", "?" => "r", "?" => "r", "р" => "r", "?" => "r",
"?" => "s", "?" => "s", "?" => "s", "?" => "s", "с" => "s", "?" => "s", "?" => "s",
"т" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t", "?" => "t",
"ù" => "u", "?" => "u", "ú" => "u", "ū" => "u", "у" => "u", "?" => "u", "?" => "u", "ǔ" => "u", "?" => "u", "?" => "u", "?" => "u", "?" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u",
"в" => "v", "?" => "v",
"y" => "y", "ы" => "y", "?" => "y", "?" => "y",
"?" => "z", "?" => "z", "?" => "z", "з" => "z", "?" => "z", "?" => "z",
"?" => "tm",
"@" => "at",
"?" => "ae", "?" => "ae", "?" => "ae", "?" => "ae", "?" => "ae",
"?" => "ij", "?" => "ij",
"я" => "ja", "Я" => "ja",
"Э" => "je", "э" => "je",
"ё" => "jo", "Ё" => "jo",
"ю" => "ju", "Ю" => "ju",
"?" => "oe", "?" => "oe", "?" => "oe", "?" => "oe",
"щ" => "sch", "Щ" => "sch",
"ш" => "sh", "Ш" => "sh",
"?" => "ss",
"ü" => "ue",
"Ж" => "zh", "ж" => "zh",
);
return strtr($subject, $char_map);
}
$string = "?í ?????, ю?? ? test!";
echo replace_spec_char($string);
?í ?????, ю?? ? test!=>
Hi there, jusst a test!
?í ?????, ю?? ? test!=>
Hi there, jusst a test!
This does not mix up upper and lower case charsexcept for longer chars (eg: ss,ch, sch) , added @ ? ?
这不会混淆大写和小写字符,除了较长的字符(例如:ss,ch, sch),添加 @ ? ?
Also if you want to build regex matching regardless to special chars :
此外,如果您想构建正则表达式匹配而不管特殊字符:
rss => '[r???????Рр](?:[s?с?????С?][s?с?????С?]|[?])'
rss => '[r???????Рр](?:[s?с?????С?][s?с?????С?]|[?])'
A vala implementation of this : https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477
一个 vala 实现:https: //code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477
Here is the base list you could work with, with regex replacing (in sublime text) or small script you can build anything from this array to fill your needs.
这是您可以使用的基本列表,使用正则表达式替换(在 sublime 文本中)或小脚本,您可以从此数组构建任何内容以满足您的需求。
"-" => "ъьЪЬ",
"A" => "А?ǎ?à?á????ā?",
"B" => "Б?T",
"C" => "???Ц?????",
"D" => "Д???D",
"E" => "è?é?êЕē?ě????",
"F" => "Ф?",
"G" => "????Г??",
"H" => "??Х??",
"I" => "I??íì??IИ?ǐ??ī?",
"J" => "Й?",
"K" => "???К?",
"L" => "??Л????",
"M" => "?М?",
"N" => "??Н??????",
"O" => "?óò??О??ō?ǒ?",
"P" => "??П",
"Q" => "?",
"R" => "????Р?",
"S" => "????С??",
"T" => "Т??????",
"U" => "ù?úūУ??ǔ????ǖǜǚǘ",
"V" => "В?",
"Y" => "YЫ??",
"Z" => "???З?",
"a" => "а?ǎ?à?á?a??ā?",
"b" => "б?t",
"c" => "???ц?????",
"ch" => "ч",
"d" => "д???e",
"e" => "è?é?êеē?ě????",
"f" => "ф?",
"g" => "????г??",
"h" => "??х??",
"i" => "i??íì???и?ǐ??ī?",
"j" => "й?",
"k" => "???к?",
"l" => "??л????",
"m" => "?м?",
"n" => "?ńн?????ň",
"o" => "?óò??о??ō?ǒ?",
"p" => "??п",
"q" => "?",
"r" => "????р?",
"s" => "????с??",
"t" => "т??????",
"u" => "ù?úūу??ǔ????ǖǜǚǘ",
"v" => "в?",
"y" => "yы??",
"z" => "???з??",
"tm" => "?",
"at" => "@",
"ae" => "?????",
"ch" => "Чч",
"ij" => "??",
"j" => "йЙ??",
"ja" => "яЯ",
"je" => "Ээ",
"jo" => "ёЁ",
"ju" => "юЮ",
"oe" => "????",
"sch" => "щЩ",
"sh" => "шШ",
"ss" => "?",
"tm" => "?",
"ue" => "ü",
"zh" => "Жж"
回答by user3682119
protected $_convertTable = array(
'&' => 'and', '@' => 'at', '?' => 'c', '?' => 'r', 'à' => 'a',
'á' => 'a', '?' => 'a', '?' => 'a', '?' => 'a', '?' => 'ae','?' => 'c',
'è' => 'e', 'é' => 'e', '?' => 'e', 'ì' => 'i', 'í' => 'i', '?' => 'i',
'?' => 'i', 'ò' => 'o', 'ó' => 'o', '?' => 'o', '?' => 'o', '?' => 'o',
'?' => 'o', 'ù' => 'u', 'ú' => 'u', '?' => 'u', 'ü' => 'u', 'Y' => 'y',
'?' => 'ss','à' => 'a', 'á' => 'a', 'a' => 'a', '?' => 'a', '?' => 'a',
'?' => 'ae','?' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', '?' => 'e',
'ì' => 'i', 'í' => 'i', '?' => 'i', '?' => 'i', 'ò' => 'o', 'ó' => 'o',
'?' => 'o', '?' => 'o', '?' => 'o', '?' => 'o', 'ù' => 'u', 'ú' => 'u',
'?' => 'u', 'ü' => 'u', 'y' => 'y', 't' => 'p', '?' => 'y', 'ā' => 'a',
'ā' => 'a', '?' => 'a', '?' => 'a', '?' => 'a', '?' => 'a', '?' => 'c',
'?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
'?' => 'c', '?' => 'd', '?' => 'd', '?' => 'd', '?' => 'd', 'ē' => 'e',
'ē' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e', '?' => 'e',
'?' => 'e', 'ě' => 'e', 'ě' => 'e', '?' => 'g', '?' => 'g', '?' => 'g',
'?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
'?' => 'h', '?' => 'h', '?' => 'h', '?' => 'i', '?' => 'i', 'ī' => 'i',
'ī' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i',
'?' => 'i', '?' => 'ij','?' => 'ij','?' => 'j', '?' => 'j', '?' => 'k',
'?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
'?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
'?' => 'n', 'ń' => 'n', '?' => 'n', '?' => 'n', '?' => 'n', 'ň' => 'n',
'?' => 'n', '?' => 'n', '?' => 'n', 'ō' => 'o', 'ō' => 'o', '?' => 'o',
'?' => 'o', '?' => 'o', '?' => 'o', '?' => 'oe','?' => 'oe','?' => 'r',
'?' => 'r', '?' => 'r', '?' => 'r', '?' => 'r', '?' => 'r', '?' => 's',
'?' => 's', '?' => 's', '?' => 's', '?' => 's', '?' => 's', '?' => 's',
'?' => 's', '?' => 't', '?' => 't', '?' => 't', '?' => 't', '?' => 't',
'?' => 't', '?' => 'u', '?' => 'u', 'ū' => 'u', 'ū' => 'u', '?' => 'u',
'?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'u',
'?' => 'u', '?' => 'w', '?' => 'w', '?' => 'y', '?' => 'y', '?' => 'y',
'?' => 'z', '?' => 'z', '?' => 'z', '?' => 'z', '?' => 'z', '?' => 'z',
'?' => 'z', '?' => 'e', '?' => 'f', '?' => 'o', '?' => 'o', '?' => 'u',
'?' => 'u', 'ǎ' => 'a', 'ǎ' => 'a', 'ǐ' => 'i', 'ǐ' => 'i', 'ǒ' => 'o',
'ǒ' => 'o', 'ǔ' => 'u', 'ǔ' => 'u', 'ǖ' => 'u', 'ǖ' => 'u', 'ǘ' => 'u',
'ǘ' => 'u', 'ǚ' => 'u', 'ǚ' => 'u', 'ǜ' => 'u', 'ǜ' => 'u', '?' => 'a',
'?' => 'a', '?' => 'ae','?' => 'ae','?' => 'o', '?' => 'o', '?' => 'e',
'Ё' => 'jo','?' => 'e', '?' => 'i', '?' => 'i', 'А' => 'a', 'Б' => 'b',
'В' => 'v', 'Г' => 'g', 'Д' => 'd', 'Е' => 'e', 'Ж' => 'zh','З' => 'z',
'И' => 'i', 'Й' => 'j', 'К' => 'k', 'Л' => 'l', 'М' => 'm', 'Н' => 'n',
'О' => 'o', 'П' => 'p', 'Р' => 'r', 'С' => 's', 'Т' => 't', 'У' => 'u',
'Ф' => 'f', 'Х' => 'h', 'Ц' => 'c', 'Ч' => 'ch','Ш' => 'sh','Щ' => 'sch',
'Ъ' => '-', 'Ы' => 'y', 'Ь' => '-', 'Э' => 'je','Ю' => 'ju','Я' => 'ja',
'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e',
'ж' => 'zh','з' => 'z', 'и' => 'i', 'й' => 'j', 'к' => 'k', 'л' => 'l',
'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
'ш' => 'sh','щ' => 'sch','ъ' => '-','ы' => 'y', 'ь' => '-', 'э' => 'je',
'ю' => 'ju','я' => 'ja','ё' => 'jo','?' => 'e', '?' => 'i', '?' => 'i',
'?' => 'g', '?' => 'g', '?' => 'a', '?' => 'b', '?' => 'g', '?' => 'd',
'?' => 'h', '?' => 'v', '?' => 'z', '?' => 'h', '?' => 't', '?' => 'i',
'?' => 'k', '?' => 'k', '?' => 'l', '?' => 'm', '?' => 'm', '?' => 'n',
'?' => 'n', '?' => 's', '?' => 'e', '?' => 'p', '?' => 'p', '?' => 'C',
'?' => 'c', '?' => 'q', '?' => 'r', '?' => 'w', '?' => 't', '?' => 'tm',
);
From magento, im using it for basically everything
从 magento,我基本上用它来做所有事情
回答by gabo
if you have http://php.net/manual/en/book.intl.phpavailable, this will solve your problem:
如果您有http://php.net/manual/en/book.intl.php可用,这将解决您的问题:
$string = "éric Cantona";
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
EDIT
编辑
To install the php extension in ubuntu:
在 ubuntu 中安装 php 扩展:
apt-get install php-intl
Don't forget, in composer, to require the extension ext-intlto ensure it properly fits into deployed systems.
不要忘记,在 Composer 中,要求扩展ext-intl以确保它正确地适合已部署的系统。
回答by Keyne Viana
Disclaimer:I'm not supporting this answer anymore (I was blind at that time). But thanks for the up-votes =P
免责声明:我不再支持这个答案(当时我是瞎子)。但是感谢您的投票=P
You can take this as basis. From WordPress, used to generate pretty urls (the entry point is the slugify() function):
你可以以此为基础。来自 WordPress,用于生成漂亮的 url(入口点是 slugify() 函数):
/**
* Converts all accent characters to ASCII characters.
*
* If there are no accent characters, then the string given is just returned.
*
* @param string $string Text that might have accent characters
* @return string Filtered string with replaced "nice" characters.
*/
function remove_accents($string) {
if (!preg_match('/[\x80-\xff]/', $string))
return $string;
if (seems_utf8($string)) {
$chars = array(
// Decompositions for Latin-1 Supplement
chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
chr(195).chr(191) => 'y',
// Decompositions for Latin Extended-A
chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
chr(197).chr(190) => 'z', chr(197).chr(191) => 's',
// Euro Sign
chr(226).chr(130).chr(172) => 'E',
// GBP (Pound) Sign
chr(194).chr(163) => '');
$string = strtr($string, $chars);
} else {
// Assume ISO-8859-1 if not UTF-8
$chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158)
.chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194)
.chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202)
.chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210)
.chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218)
.chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227)
.chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235)
.chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243)
.chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251)
.chr(252).chr(253).chr(255);
$chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy";
$string = strtr($string, $chars['in'], $chars['out']);
$double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254));
$double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th');
$string = str_replace($double_chars['in'], $double_chars['out'], $string);
}
return $string;
}
/**
* Checks to see if a string is utf8 encoded.
*
* @author bmorel at ssi dot fr
*
* @param string $Str The string to be checked
* @return bool True if $Str fits a UTF-8 model, false otherwise.
*/
function seems_utf8($Str) { # by bmorel at ssi dot fr
$length = strlen($Str);
for ($i = 0; $i < $length; $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b
else return false; # Does not match any model
for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
function utf8_uri_encode($utf8_string, $length = 0) {
$unicode = '';
$values = array();
$num_octets = 1;
$unicode_length = 0;
$string_length = strlen($utf8_string);
for ($i = 0; $i < $string_length; $i++) {
$value = ord($utf8_string[$i]);
if ($value < 128) {
if ($length && ($unicode_length >= $length))
break;
$unicode .= chr($value);
$unicode_length++;
} else {
if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3;
$values[] = $value;
if ($length && ($unicode_length + ($num_octets * 3)) > $length)
break;
if (count( $values ) == $num_octets) {
if ($num_octets == 3) {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]);
$unicode_length += 9;
} else {
$unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]);
$unicode_length += 6;
}
$values = array();
$num_octets = 1;
}
}
}
return $unicode;
}
/**
* Sanitizes title, replacing whitespace with dashes.
*
* Limits the output to alphanumeric characters, underscore (_) and dash (-).
* Whitespace becomes a dash.
*
* @param string $title The title to be sanitized.
* @return string The sanitized title.
*/
function slugify($title) {
$title = strip_tags($title);
// Preserve escaped octets.
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '------', $title);
// Remove percent signs that are not part of an octet.
$title = str_replace('%', '', $title);
// Restore octets.
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%', $title);
$title = remove_accents($title);
if (seems_utf8($title)) {
if (function_exists('mb_strtolower')) {
$title = mb_strtolower($title, 'UTF-8');
}
$title = utf8_uri_encode($title, 200);
}
$title = strtolower($title);
$title = preg_replace('/&.+?;/', '', $title); // kill entities
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
$title = preg_replace('/\s+/', '-', $title);
$title = preg_replace('|-+|', '-', $title);
$title = trim($title, '-');
return $title;
}

