PHP 中的西里尔字母音译
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7461406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cyrillic transliteration in PHP
提问by Sfisioza
How to transliterate cyrillic characters into latin letters?
如何将西里尔字母音译成拉丁字母?
E.g. Главная страница -> Glavnaja stranica
This Transliteration PHP Extensionwould do this very well, but I can't install it on my server.
这个Transliteration PHP Extension可以很好地做到这一点,但我无法在我的服务器上安装它。
It would be best to have the same implementation but in PHP.
最好有相同的实现,但在 PHP 中。
回答by Tural Ali
Try following code
试试下面的代码
$textcyr="Тествам с кирилица";
$textlat="I pone dotuk raboti!";
$cyr = [
'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п',
'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я',
'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П',
'Р','С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'
];
$lat = [
'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p',
'r','s','t','u','f','h','ts','ch','sh','sht','a','i','y','e','yu','ya',
'A','B','V','G','D','E','Io','Zh','Z','I','Y','K','L','M','N','O','P',
'R','S','T','U','F','H','Ts','Ch','Sh','Sht','A','I','Y','e','Yu','Ya'
];
$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");
回答by bobef
@Tural Teyyuboglu
@Tural Teyyuboglu
Your code has a problem: if you try to transliterate for example "щеки" to latin and then back to cyrillic it will produce something like "схтеки". The multi-byte characters must appear first in the array like this:
您的代码有一个问题:例如,如果您尝试将“щеки”音译为拉丁语,然后再转回西里尔字母,则会产生类似“схтеки”的内容。多字节字符必须首先出现在数组中,如下所示:
function transliterate($textcyr = null, $textlat = null) {
$cyr = array(
'ж', 'ч', 'щ', 'ш', 'ю', 'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ъ', 'ь', 'я',
'Ж', 'Ч', 'Щ', 'Ш', 'Ю', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ъ', 'Ь', 'Я');
$lat = array(
'zh', 'ch', 'sht', 'sh', 'yu', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'y', 'x', 'q',
'Zh', 'Ch', 'Sht', 'Sh', 'Yu', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'c', 'Y', 'X', 'Q');
if($textcyr) return str_replace($cyr, $lat, $textcyr);
else if($textlat) return str_replace($lat, $cyr, $textlat);
else return null;
}
echo transliterate(null, transliterate("щеки")) == "щеки";
echo transliterate(null, transliterate("щеки")) == "щеки";
回答by Ilyich
The best option is using PHP Intl Extension. You might want install itfirst.
最好的选择是使用 PHP Intl Extension。您可能想先安装它。
This will do the trick:
这将解决问题:
$transliteratedString = transliterator_transliterate('Russian-Latin/BGN', $cyrillicString);
I applied 'Russian-Latin/BGN' because the asker used Russian language in his question. However, there are options for other languages written in the Cyrillic script. To view all of them do this:
我应用了“俄语-拉丁语/BGN”,因为提问者在他的问题中使用了俄语。但是,还有其他用西里尔字母编写的语言的选项。要查看所有这些,请执行以下操作:
print_r(transliterator_list_ids());
回答by Boris Janjetovic
Here is a function that I use for cleaning characters on Bosnian,Croatian,Serbian latin
这是我用来清理波斯尼亚语、克罗地亚语、塞尔维亚语拉丁语字符的函数
function cleanUTF($name){
$name = str_replace(array('?','?','?','?','?','?','?'),array('s','c','d','c','c','z','n'), $name);
$name = str_replace(array('?','?','?','?','?', '?','?'),array('S','C','D','C','C','Z','N'), $name);
$name = str_replace(array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','?','м','н','?','о','п','р','с','т','у','ф','х','ц','ч','?','ш','щ','ъ','ы','ь','э','ю','я','А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','?','М','Н','?','О','П','Р','С','Т','У','Ф','Х','Ц','Ч','?','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'),
array('a','b','v','g','d','e','e','z','z','i','j','k','l','lj','m','n','nj','o','p','r','s','t','u','f','h','c','c','dz','s','s','i','j','j','e','ju','ja','A','B','V','G','D','E','E','Z','Z','I','J','K','L','Lj','M','N','Nj','O','P','R','S','T','U','F','H','C','C','Dz','S','S','I','J','J','E','Ju','Ja'), $name);
return $name;
}
回答by Kerrek SB
回答by Av007
$textcyr="Тест на кирилице";
$textlat="Test na kirilitse!";
$cyr = array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п','р','с','т','у',
'ф','х','ц','ч','ш','щ','ъ', 'ы','ь', 'э', 'ю','я','А','Б','В','Г','Д','Е','Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У',
'Ф','Х','Ц','Ч','Ш','Щ','Ъ', 'Ы','Ь', 'Э', 'Ю','Я' );
$lat = array( 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p','r','s','t','u',
'f' ,'h' ,'ts' ,'ch','sh' ,'sht' ,'a', 'i', 'y', 'e' ,'yu' ,'ya','A','B','V','G','D','E','Zh',
'Z','I','Y','K','L','M','N','O','P','R','S','T','U',
'F' ,'H' ,'Ts' ,'Ch','Sh' ,'Sht' ,'A' ,'Y' ,'Yu' ,'Ya' );
$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");
missing ё, э, ы (Э, Ы, Ё) letters
缺少ё, э, ы (Э, Ы, Ё) 字母
回答by Tomasz Kap?oński
I wrote a full transliteration class for all European languages for utf-8. May help (comments are in polish but there isn't a lot of them so here's a few hints:
我为 utf-8 为所有欧洲语言编写了完整的音译课程。可能会有所帮助(评论是波兰语,但不是很多,所以这里有一些提示:
- numbers stored in constants are idCountry in local databse - you change them as you like.
- "Rób transliteracj? dla " means "do transliteration for " - you determine country by const name.
- "S?ownik t?umacz?cy rosyjsk? cyrylic? wg standardu " means "dictionary with transliteration by standard "
- "Tablica wycinaj?ca akcenty z ró?nych znaków narodowych pobrana z http://stuffofinterest.com/misc/utf8-about.html" means "Array to cut off accents from different languages" (it might help if you find some errors in iconv (or cannot use it for some reason).
- Methods utf2ascii and cyr2lat are pretty obvious.
- 存储在常量中的数字是本地数据库中的 idCountry - 您可以随意更改它们。
- “Rób transliteracj? dla”的意思是“为”做音译——你用常量名来确定国家。
- “S?ownik t?umacz?cy rosyjsk?cyrylic?wg standardu”的意思是“按标准音译的字典”
- “Tablica wycinaj?ca akcenty z ró?nych znaków narodowych pobrana z http://stuffofinterest.com/misc/utf8-about.html”的意思是“从不同语言中切断重音的数组”(如果你发现一些错误可能会有所帮助在 iconv 中(或由于某种原因无法使用它)。
- utf2ascii 和 cyr2lat 方法非常明显。
Hope it will help a few people 'cause implementing it was a nightmare :)
希望它会帮助一些人,因为实施它是一场噩梦:)
Edit: I just noticed that part of the code is missing so I've put the full class on Pastie: class
编辑:我只是注意到缺少部分代码,所以我把完整的课程放在 Pastie: class 上
回答by pc_
This one worked best for me. Code is from this page
这个对我来说效果最好。代码来自此页面
function ru2lat($str)
{
$tr = array(
"А"=>"a", "Б"=>"b", "В"=>"v", "Г"=>"g", "Д"=>"d",
"Е"=>"e", "Ё"=>"yo", "Ж"=>"zh", "З"=>"z", "И"=>"i",
"Й"=>"j", "К"=>"k", "Л"=>"l", "М"=>"m", "Н"=>"n",
"О"=>"o", "П"=>"p", "Р"=>"r", "С"=>"s", "Т"=>"t",
"У"=>"u", "Ф"=>"f", "Х"=>"kh", "Ц"=>"ts", "Ч"=>"ch",
"Ш"=>"sh", "Щ"=>"sch", "Ъ"=>"", "Ы"=>"y", "Ь"=>"",
"Э"=>"e", "Ю"=>"yu", "Я"=>"ya", "а"=>"a", "б"=>"b",
"в"=>"v", "г"=>"g", "д"=>"d", "е"=>"e", "ё"=>"yo",
"ж"=>"zh", "з"=>"z", "и"=>"i", "й"=>"j", "к"=>"k",
"л"=>"l", "м"=>"m", "н"=>"n", "о"=>"o", "п"=>"p",
"р"=>"r", "с"=>"s", "т"=>"t", "у"=>"u", "ф"=>"f",
"х"=>"kh", "ц"=>"ts", "ч"=>"ch", "ш"=>"sh", "щ"=>"sch",
"ъ"=>"", "ы"=>"y", "ь"=>"", "э"=>"e", "ю"=>"yu",
"я"=>"ya", " "=>"-", "."=>"", ","=>"", "/"=>"-",
":"=>"", ";"=>"","—"=>"", "–"=>"-"
);
return strtr($str,$tr);
}
Hope this helps someone.
希望这可以帮助某人。
回答by user5720164
This is my version of transliteration table for russian alphabet. It's unofficial but based on technical standards GOST 7.79-2000 and GOST 16876-71. Multi-characters go first.
这是我的俄语字母音译表版本。它是非官方的,但基于 GOST 7.79-2000 和 GOST 16876-71 技术标准。多字符优先。
public static function transliterate($textcyr = null, $textlat = null) {
$cyr = array(
'ё', 'ж', 'х', 'ц', 'ч', 'щ', 'ш', 'ъ', 'э', 'ю', 'я', 'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'ь',
'Ё', 'Ж', 'Х', 'Ц', 'Ч', 'Щ', 'Ш', 'Ъ', 'Э', 'Ю', 'Я', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Ь');
$lat = array(
'yo', 'zh', 'kh', 'ts', 'ch', 'shh', 'sh', '``', 'eh', 'yu', 'ya', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', '`',
'Yo', 'Zh', 'Kh', 'Ts', 'Ch', 'Shh', 'Sh', '``', 'Eh', 'Yu', 'Ya', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', '`');
if($textcyr)
return str_replace($cyr, $lat, $textcyr);
else if($textlat)
return str_replace($lat, $cyr, $textlat);
else
return null;
}
回答by Alexander Dolgopolskiy
Respecting the Yandex transliteration rules (http://www.translityandex.ru/) and converting the upper case:
尊重 Yandex 音译规则 ( http://www.translityandex.ru/) 并转换大写:
function translit_russian_filenames( $filename ) {
$info = pathinfo( $filename );
$ext = empty( $info['extension'] ) ? '' : '.' . $info['extension'];
$name = basename( $filename, $ext );
$cyr = array(
'а', 'б', 'в', 'г', 'д', 'е', 'ё', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я',
'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'ы', 'Ь', 'Э', 'Ю', 'Я' );
$lat = array(
'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya',
'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya');
$name_translit = str_replace($cyr, $lat, $name);
return $name_translit . $ext;
}
add_filter( 'sanitize_file_name', 'translit_russian_filenames', 10 );