PHP 中的西里尔字母音译

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7461406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 02:43:46  来源:igfitidea点击:

Cyrillic transliteration in PHP

phpunicodetransliteration

提问by Sfisioza

How to transliterate cyrillic characters into latin letters?

如何将西里尔字母音译成拉丁字母?

E.g. Главная страница -> Glavnaja stranica

This Transliteration PHP Extensionwould do this very well, but I can't install it on my server.

这个Transliteration PHP Extension可以很好地做到这一点,但我无法在我的服务器上安装它。

It would be best to have the same implementation but in PHP.

最好有相同的实现,但在 PHP 中。

回答by Tural Ali

Try following code

试试下面的代码

$textcyr="Тествам с кирилица";
        $textlat="I pone dotuk raboti!";
        $cyr = [
            'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п',
            'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я',
            'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П',
            'Р','С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'
        ];
        $lat = [
            'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p',
            'r','s','t','u','f','h','ts','ch','sh','sht','a','i','y','e','yu','ya',
            'A','B','V','G','D','E','Io','Zh','Z','I','Y','K','L','M','N','O','P',
            'R','S','T','U','F','H','Ts','Ch','Sh','Sht','A','I','Y','e','Yu','Ya'
        ];
        $textcyr = str_replace($cyr, $lat, $textcyr);
        $textlat = str_replace($lat, $cyr, $textlat);
        echo("$textcyr $textlat");

回答by bobef

@Tural Teyyuboglu

@Tural Teyyuboglu

Your code has a problem: if you try to transliterate for example "щеки" to latin and then back to cyrillic it will produce something like "схтеки". The multi-byte characters must appear first in the array like this:

您的代码有一个问题:例如,如果您尝试将“щеки”音译为拉丁语,然后再转回西里尔字母,则会产生类似“схтеки”的内容。多字节字符必须首先出现在数组中,如下所示:

function transliterate($textcyr = null, $textlat = null) {
    $cyr = array(
    'ж',  'ч',  'щ',   'ш',  'ю',  'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ъ', 'ь', 'я',
    'Ж',  'Ч',  'Щ',   'Ш',  'Ю',  'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ъ', 'Ь', 'Я');
    $lat = array(
    'zh', 'ch', 'sht', 'sh', 'yu', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'y', 'x', 'q',
    'Zh', 'Ch', 'Sht', 'Sh', 'Yu', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', 'H', 'c', 'Y', 'X', 'Q');
    if($textcyr) return str_replace($cyr, $lat, $textcyr);
    else if($textlat) return str_replace($lat, $cyr, $textlat);
    else return null;
}

echo transliterate(null, transliterate("щеки")) == "щеки";

echo transliterate(null, transliterate("щеки")) == "щеки";

回答by Ilyich

The best option is using PHP Intl Extension. You might want install itfirst.

最好的选择是使用 PHP Intl Extension。您可能想先安装它

This will do the trick:

这将解决问题:

$transliteratedString = transliterator_transliterate('Russian-Latin/BGN', $cyrillicString);

I applied 'Russian-Latin/BGN' because the asker used Russian language in his question. However, there are options for other languages written in the Cyrillic script. To view all of them do this:

我应用了“俄语-拉丁语/BGN”,因为提问者在他的问题中使用了俄语。但是,还有其他用西里尔字母编写的语言的选项。要查看所有这些,请执行以下操作:

print_r(transliterator_list_ids());

回答by Boris Janjetovic

Here is a function that I use for cleaning characters on Bosnian,Croatian,Serbian latin

这是我用来清理波斯尼亚语、克罗地亚语、塞尔维亚语拉丁语字符的函数

 function cleanUTF($name){
        $name = str_replace(array('?','?','?','?','?','?','?'),array('s','c','d','c','c','z','n'), $name);
        $name = str_replace(array('?','?','?','?','?', '?','?'),array('S','C','D','C','C','Z','N'), $name);
        $name = str_replace(array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','?','м','н','?','о','п','р','с','т','у','ф','х','ц','ч','?','ш','щ','ъ','ы','ь','э','ю','я','А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','?','М','Н','?','О','П','Р','С','Т','У','Ф','Х','Ц','Ч','?','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'),
                            array('a','b','v','g','d','e','e','z','z','i','j','k','l','lj','m','n','nj','o','p','r','s','t','u','f','h','c','c','dz','s','s','i','j','j','e','ju','ja','A','B','V','G','D','E','E','Z','Z','I','J','K','L','Lj','M','N','Nj','O','P','R','S','T','U','F','H','C','C','Dz','S','S','I','J','J','E','Ju','Ja'), $name);
        return $name;
    }

回答by Kerrek SB

You should try iconv()with the //TRANSLIToption.

您应该尝试iconv()使用该//TRANSLIT选项。

$trstr = iconv(<your encoding here>, "ISO-8859-1//TRANSLIT", $src_str)

回答by Av007

$textcyr="Тест на кирилице";
$textlat="Test na kirilitse!";
$cyr  = array('а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п','р','с','т','у', 
            'ф','х','ц','ч','ш','щ','ъ', 'ы','ь', 'э', 'ю','я','А','Б','В','Г','Д','Е','Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У',
            'Ф','Х','Ц','Ч','Ш','Щ','Ъ', 'Ы','Ь', 'Э', 'Ю','Я' );
$lat = array( 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p','r','s','t','u',
            'f' ,'h' ,'ts' ,'ch','sh' ,'sht' ,'a', 'i', 'y', 'e' ,'yu' ,'ya','A','B','V','G','D','E','Zh',
            'Z','I','Y','K','L','M','N','O','P','R','S','T','U',
            'F' ,'H' ,'Ts' ,'Ch','Sh' ,'Sht' ,'A' ,'Y' ,'Yu' ,'Ya' );

$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");

missing ё, э, ы (Э, Ы, Ё) letters

缺少ё, э, ы (Э, Ы, Ё) 字母

回答by Tomasz Kap?oński

I wrote a full transliteration class for all European languages for utf-8. May help (comments are in polish but there isn't a lot of them so here's a few hints:

我为 utf-8 为所有欧洲语言编写了完整的音译课程。可能会有所帮助(评论是波兰语,但不是很多,所以这里有一些提示:

  1. numbers stored in constants are idCountry in local databse - you change them as you like.
  2. "Rób transliteracj? dla " means "do transliteration for " - you determine country by const name.
  3. "S?ownik t?umacz?cy rosyjsk? cyrylic? wg standardu " means "dictionary with transliteration by standard "
  4. "Tablica wycinaj?ca akcenty z ró?nych znaków narodowych pobrana z http://stuffofinterest.com/misc/utf8-about.html" means "Array to cut off accents from different languages" (it might help if you find some errors in iconv (or cannot use it for some reason).
  5. Methods utf2ascii and cyr2lat are pretty obvious.
  1. 存储在常量中的数字是本地数据库中的 idCountry - 您可以随意更改它们。
  2. “Rób transliteracj? dla”的意思是“为”做音译——你用常量名来确定国家。
  3. “S?ownik t?umacz?cy rosyjsk?cyrylic?wg standardu”的意思是“按标准音译的字典”
  4. “Tablica wycinaj?ca akcenty z ró?nych znaków narodowych pobrana z http://stuffofinterest.com/misc/utf8-about.html”的意思是“从不同语言中切断重音的数组”(如果你发现一些错误可能会有所帮助在 iconv 中(或由于某种原因无法使用它)。
  5. utf2ascii 和 cyr2lat 方法非常明显。

Hope it will help a few people 'cause implementing it was a nightmare :)

希望它会帮助一些人,因为实施它是一场噩梦:)

Edit: I just noticed that part of the code is missing so I've put the full class on Pastie: class

编辑:我只是注意到缺少部分代码,所以我把完整的课程放在 Pastie: class 上

回答by pc_

This one worked best for me. Code is from this page

这个对我来说效果最好。代码来自此页面

function ru2lat($str)
{
    $tr = array(
    "А"=>"a", "Б"=>"b", "В"=>"v", "Г"=>"g", "Д"=>"d",
    "Е"=>"e", "Ё"=>"yo", "Ж"=>"zh", "З"=>"z", "И"=>"i", 
    "Й"=>"j", "К"=>"k", "Л"=>"l", "М"=>"m", "Н"=>"n", 
    "О"=>"o", "П"=>"p", "Р"=>"r", "С"=>"s", "Т"=>"t", 
    "У"=>"u", "Ф"=>"f", "Х"=>"kh", "Ц"=>"ts", "Ч"=>"ch", 
    "Ш"=>"sh", "Щ"=>"sch", "Ъ"=>"", "Ы"=>"y", "Ь"=>"", 
    "Э"=>"e", "Ю"=>"yu", "Я"=>"ya", "а"=>"a", "б"=>"b", 
    "в"=>"v", "г"=>"g", "д"=>"d", "е"=>"e", "ё"=>"yo", 
    "ж"=>"zh", "з"=>"z", "и"=>"i", "й"=>"j", "к"=>"k", 
    "л"=>"l", "м"=>"m", "н"=>"n", "о"=>"o", "п"=>"p", 
    "р"=>"r", "с"=>"s", "т"=>"t", "у"=>"u", "ф"=>"f", 
    "х"=>"kh", "ц"=>"ts", "ч"=>"ch", "ш"=>"sh", "щ"=>"sch", 
    "ъ"=>"", "ы"=>"y", "ь"=>"", "э"=>"e", "ю"=>"yu", 
    "я"=>"ya", " "=>"-", "."=>"", ","=>"", "/"=>"-",  
    ":"=>"", ";"=>"","—"=>"", "–"=>"-"
    );
return strtr($str,$tr);
}

Hope this helps someone.

希望这可以帮助某人。

回答by user5720164

This is my version of transliteration table for russian alphabet. It's unofficial but based on technical standards GOST 7.79-2000 and GOST 16876-71. Multi-characters go first.

这是我的俄语字母音译表版本。它是非官方的,但基于 GOST 7.79-2000 和 GOST 16876-71 技术标准。多字符优先。

public static function transliterate($textcyr = null, $textlat = null) {
    $cyr = array(
        'ё',  'ж',  'х',  'ц',  'ч',  'щ',   'ш',  'ъ',  'э',  'ю',  'я',  'а', 'б', 'в', 'г', 'д', 'е', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'ь',
        'Ё',  'Ж',  'Х',  'Ц',  'Ч',  'Щ',   'Ш',  'Ъ',  'Э',  'Ю',  'Я',  'А', 'Б', 'В', 'Г', 'Д', 'Е', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Ь');
    $lat = array(
        'yo', 'zh', 'kh', 'ts', 'ch', 'shh', 'sh', '``', 'eh', 'yu', 'ya', 'a', 'b', 'v', 'g', 'd', 'e', 'z', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', '`',
        'Yo', 'Zh', 'Kh', 'Ts', 'Ch', 'Shh', 'Sh', '``', 'Eh', 'Yu', 'Ya', 'A', 'B', 'V', 'G', 'D', 'E', 'Z', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'F', '`');
    if($textcyr)
        return str_replace($cyr, $lat, $textcyr);
    else if($textlat)
        return str_replace($lat, $cyr, $textlat);
    else
        return null;
}

回答by Alexander Dolgopolskiy

Respecting the Yandex transliteration rules (http://www.translityandex.ru/) and converting the upper case:

尊重 Yandex 音译规则 ( http://www.translityandex.ru/) 并转换大写:

function translit_russian_filenames( $filename ) {
    $info = pathinfo( $filename );
    $ext  = empty( $info['extension'] ) ? '' : '.' . $info['extension'];
    $name = basename( $filename, $ext );
     $cyr = array(
    'а', 'б', 'в', 'г', 'д', 'е', 'ё', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я',
    'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ё', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'ы', 'Ь', 'Э', 'Ю', 'Я' );
    $lat = array(
    'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya',
    'a', 'b', 'v', 'g', 'd', 'e', 'yo', 'zh', 'z', 'i', 'y', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'f', 'h', 'c', 'ch', 'sh', 'shch', '', 'y', '', 'e', 'yu', 'ya');
    $name_translit = str_replace($cyr, $lat, $name);
    return $name_translit . $ext;
}
add_filter( 'sanitize_file_name', 'translit_russian_filenames', 10 );