如何用它们在 PHP 中基于的特殊字符替换特殊字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1890854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:11:59  来源:igfitidea点击:

how to replace special characters with the ones they're based on in PHP?

phpstringreplacespecial-characterscharacter-encoding

提问by Carlos

How do I replace:

我如何更换:

  • "?" with "a"
  • "é" with "e"
  • “?” 用“一个”
  • “é”与“e”

in PHP? Is this possible? I've read somewhere I could do some math with the ascii value of the base character and the ascii value of the accent, but I can't find any references now.

在 PHP 中?这可能吗?我在某处读过我可以用基本字符的 ascii 值和重音的 ascii 值做一些数学运算,但我现在找不到任何参考。

采纳答案by McPherrinM

This answer is incorrect. I didn't understand Unicode Normalization when I wrote it. Look at francadaval's comment and link

这个答案是不正确的。我写的时候不理解 Unicode Normalization。看 francadaval 的评论和链接

Check out the Normalizer class to do this. The documentation is good, so I'll just link it instead of repeating things here:

查看 Normalizer 类来做到这一点。文档很好,所以我只是链接它而不是在这里重复:

http://www.php.net/manual/en/class.normalizer.php

http://www.php.net/manual/en/class.normalizer.php

Specifically, the normalize member of that class:

具体来说,该类的 normalize 成员:

http://www.php.net/manual/en/normalizer.normalize.php

http://www.php.net/manual/en/normalizer.normalize.php

Note that Unicode normalization has several forms, and you seem to want Normalization Form KD (NFKD) Compatibility Decomposition, though you should read the documentation to make sure.

请注意,Unicode 规范化有多种形式,您似乎想要规范化形式 KD (NFKD) 兼容性分解,但您应该阅读文档以确保。

You shouldn't try to roll your own function for this: There's way too many things that can go wrong, and using the provided function is a much better idea.

您不应该尝试为此推出自己的函数:可能出错的事情太多了,使用提供的函数是一个更好的主意。

回答by Alix Axel

If you don't have access to the Normalizer class or just don't wish to use it you can use the following function to replace most (all?) of the common accentuations.

如果您无权访问 Normalizer 类或只是不想使用它,您可以使用以下函数来替换大多数(全部?)常见的强调。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '', htmlentities($string, ENT_QUOTES, 'UTF-8'));
}

回答by rcaceres

For those who don't have php 5.3, I found another solution that works well and seems very comprehensive. Here is a link to the author's website http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001. Here is the function.

对于那些没有 php 5.3 的人,我找到了另一个运行良好且看起来非常全面的解决方案。这是作者网站的链接http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001。这是功能。

/**
 * Unaccent the input string string. An example string like `à?????ζ?Бю`
 * will be translated to `AOeyIOzoBY`. More complete than :
 *   strtr( (string)$str,
 *          "àá????àáa???òó????òó????èéê?èéê???ìí??ìí??ùú?üùú?ü???",
 *          "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn" );
 *
 * @param $str input string
 * @param $utf8 if null, function will detect input string encoding
 * @author http://www.evaisse.net/2008/php-translit-remove-accent-unaccent-21001
 * @return string input string without accent
 */
function remove_accents( $str, $utf8=true )
{
    $str = (string)$str;
    if( is_null($utf8) ) {
        if( !function_exists('mb_detect_encoding') ) {
            $utf8 = (strtolower( mb_detect_encoding($str) )=='utf-8');
        } else {
            $length = strlen($str);
            $utf8 = true;
            for ($i=0; $i < $length; $i++) {
                $c = ord($str[$i]);
                if ($c < 0x80) $n = 0; # 0bbbbbbb
                elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
                elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
                elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
                elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
                elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
                else return false; # Does not match any model
                for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                    if ((++$i == $length)
                        || ((ord($str[$i]) & 0xC0) != 0x80)) {
                        $utf8 = false;
                        break;
                    }

                }
            }
        }

    }

    if(!$utf8)
        $str = utf8_encode($str);

    $transliteration = array(
    '?' => 'I', '?' => 'O','?' => 'O','ü' => 'U','?' => 'a','?' => 'a',
    '?' => 'i','?' => 'o','?' => 'o','ü' => 'u','?' => 's','?' => 's',
    'à' => 'A','á' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A',
    '?' => 'A','ā' => 'A','?' => 'A','?' => 'A','?' => 'C','?' => 'C',
    '?' => 'C','?' => 'C','?' => 'C','?' => 'D','?' => 'D','è' => 'E',
    'é' => 'E','ê' => 'E','?' => 'E','ē' => 'E','?' => 'E','ě' => 'E',
    '?' => 'E','?' => 'E','?' => 'G','?' => 'G','?' => 'G','?' => 'G',
    '?' => 'H','?' => 'H','ì' => 'I','í' => 'I','?' => 'I','?' => 'I',
    'ī' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'J',
    '?' => 'K','?' => 'K','?' => 'K','?' => 'K','?' => 'K','?' => 'L',
    '?' => 'N','?' => 'N','?' => 'N','?' => 'N','?' => 'N','ò' => 'O',
    'ó' => 'O','?' => 'O','?' => 'O','?' => 'O','ō' => 'O','?' => 'O',
    '?' => 'O','?' => 'R','?' => 'R','?' => 'R','?' => 'S','?' => 'S',
    '?' => 'S','?' => 'S','?' => 'S','?' => 'T','?' => 'T','?' => 'T',
    '?' => 'T','ù' => 'U','ú' => 'U','?' => 'U','ū' => 'U','?' => 'U',
    '?' => 'U','?' => 'U','?' => 'U','?' => 'U','?' => 'W','?' => 'Y',
    '?' => 'Y','Y' => 'Y','?' => 'Z','?' => 'Z','?' => 'Z','à' => 'a',
    'á' => 'a','a' => 'a','?' => 'a','ā' => 'a','?' => 'a','?' => 'a',
    '?' => 'a','?' => 'c','?' => 'c','?' => 'c','?' => 'c','?' => 'c',
    '?' => 'd','?' => 'd','è' => 'e','é' => 'e','ê' => 'e','?' => 'e',
    'ē' => 'e','?' => 'e','ě' => 'e','?' => 'e','?' => 'e','?' => 'f',
    '?' => 'g','?' => 'g','?' => 'g','?' => 'g','?' => 'h','?' => 'h',
    'ì' => 'i','í' => 'i','?' => 'i','?' => 'i','ī' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'j','?' => 'k','?' => 'k',
    '?' => 'l','?' => 'l','?' => 'l','?' => 'l','?' => 'l','?' => 'n',
    'ń' => 'n','ň' => 'n','?' => 'n','?' => 'n','?' => 'n','ò' => 'o',
    'ó' => 'o','?' => 'o','?' => 'o','?' => 'o','ō' => 'o','?' => 'o',
    '?' => 'o','?' => 'r','?' => 'r','?' => 'r','?' => 's','?' => 's',
    '?' => 't','ù' => 'u','ú' => 'u','?' => 'u','ū' => 'u','?' => 'u',
    '?' => 'u','?' => 'u','?' => 'u','?' => 'u','?' => 'w','?' => 'y',
    'y' => 'y','?' => 'y','?' => 'z','?' => 'z','?' => 'z','Α' => 'A',
    '?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A',
    '?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A',
    '?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A','?' => 'A',
    '?' => 'A','?' => 'A','?' => 'A','Β' => 'B','Γ' => 'G','Δ' => 'D',
    'Ε' => 'E','?' => 'E','?' => 'E','?' => 'E','?' => 'E','?' => 'E',
    '?' => 'E','?' => 'E','?' => 'E','Ζ' => 'Z','Η' => 'I','?' => 'I',
    '?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I',
    '?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I',
    '?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I',
    'Θ' => 'T','Ι' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I',
    '?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I','?' => 'I',
    '?' => 'I','?' => 'I','?' => 'I','Κ' => 'K','Λ' => 'L','Μ' => 'M',
    'Ν' => 'N','Ξ' => 'K','Ο' => 'O','?' => 'O','?' => 'O','?' => 'O',
    '?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O','Π' => 'P',
    'Ρ' => 'R','?' => 'R','Σ' => 'S','Τ' => 'T','Υ' => 'Y','?' => 'Y',
    '?' => 'Y','?' => 'Y','?' => 'Y','?' => 'Y','?' => 'Y','?' => 'Y',
    '?' => 'Y','?' => 'Y','Φ' => 'F','Χ' => 'X','Ψ' => 'P','Ω' => 'O',
    '?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O',
    '?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O',
    '?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O','?' => 'O',
    '?' => 'O','α' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a',
    '?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a',
    '?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a',
    '?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a','?' => 'a',
    '?' => 'a','?' => 'a','?' => 'a','β' => 'b','γ' => 'g','δ' => 'd',
    'ε' => 'e','?' => 'e','?' => 'e','?' => 'e','?' => 'e','?' => 'e',
    '?' => 'e','?' => 'e','?' => 'e','ζ' => 'z','η' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','θ' => 't','ι' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i',
    '?' => 'i','?' => 'i','?' => 'i','?' => 'i','?' => 'i','κ' => 'k',
    'λ' => 'l','μ' => 'm','ν' => 'n','ξ' => 'k','ο' => 'o','?' => 'o',
    '?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o',
    '?' => 'o','π' => 'p','ρ' => 'r','?' => 'r','?' => 'r','σ' => 's',
    '?' => 's','τ' => 't','υ' => 'y','?' => 'y','?' => 'y','?' => 'y',
    '?' => 'y','?' => 'y','?' => 'y','?' => 'y','?' => 'y','?' => 'y',
    '?' => 'y','?' => 'y','?' => 'y','?' => 'y','?' => 'y','?' => 'y',
    '?' => 'y','?' => 'y','φ' => 'f','χ' => 'x','ψ' => 'p','ω' => 'o',
    '?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o',
    '?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o',
    '?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o',
    '?' => 'o','?' => 'o','?' => 'o','?' => 'o','?' => 'o','А' => 'A',
    'Б' => 'B','В' => 'V','Г' => 'G','Д' => 'D','Е' => 'E','Ё' => 'E',
    'Ж' => 'Z','З' => 'Z','И' => 'I','Й' => 'I','К' => 'K','Л' => 'L',
    'М' => 'M','Н' => 'N','О' => 'O','П' => 'P','Р' => 'R','С' => 'S',
    'Т' => 'T','У' => 'U','Ф' => 'F','Х' => 'K','Ц' => 'T','Ч' => 'C',
    'Ш' => 'S','Щ' => 'S','Ы' => 'Y','Э' => 'E','Ю' => 'Y','Я' => 'Y',
    'а' => 'A','б' => 'B','в' => 'V','г' => 'G','д' => 'D','е' => 'E',
    'ё' => 'E','ж' => 'Z','з' => 'Z','и' => 'I','й' => 'I','к' => 'K',
    'л' => 'L','м' => 'M','н' => 'N','о' => 'O','п' => 'P','р' => 'R',
    'с' => 'S','т' => 'T','у' => 'U','ф' => 'F','х' => 'K','ц' => 'T',
    'ч' => 'C','ш' => 'S','щ' => 'S','ы' => 'Y','э' => 'E','ю' => 'Y',
    'я' => 'Y','e' => 'd','D' => 'D','t' => 't','T' => 'T','?' => 'a',
    '?' => 'b','?' => 'g','?' => 'd','?' => 'e','?' => 'v','?' => 'z',
    '?' => 't','?' => 'i','?' => 'k','?' => 'l','?' => 'm','?' => 'n',
    '?' => 'o','?' => 'p','?' => 'z','?' => 'r','?' => 's','?' => 't',
    '?' => 'u','?' => 'p','?' => 'k','?' => 'g','?' => 'q','?' => 's',
    '?' => 'c','?' => 't','?' => 'd','?' => 't','?' => 'c','?' => 'k',
    '?' => 'j','?' => 'h'
    );
    $str = str_replace( array_keys( $transliteration ),
                        array_values( $transliteration ),
                        $str);
    return $str;
}
//- remove_accents()

回答by quantme

Short str_replaceuse with custom chars:

str_replace与自定义字符一起使用:

<?php
  $original_string    = "?Dónde está el ni?o que vive aquí? En el témpano o en el iglú. áFRICA, MéXICO, íNDICE, CANCIóN y NúMERO.";

  $some_special_chars = array("á", "é", "í", "ó", "ú", "á", "é", "í", "ó", "ú", "?", "?");
  $replacement_chars  = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "n", "N");

  $replaced_string    = str_replace($some_special_chars, $replacement_chars, $original_string);

  echo $replaced_string; // outputs '?Donde esta el nino que vive aqui? En el tempano o en el iglu. AFRICA, MEXICO, INDICE, CANCION y NUMERO.'
?>

回答by lethargy

If none of the other solutions are working right for you, here's what worked for me:

如果其他解决方案都不适合您,以下是对我有用的方法:

<?php

$string = "áéíóúá—whatever";

// create an array of the hex codes of the characters you want to replace (formatted as shown) and whatever you want to replace them with.
$characters = array(
  "[\xF3]" => "&ocacute;", //ó
  "[\xFC]" => "&uuml;", //ü
  "[\xF1]" => "&ntilde;", //?
  "[\xEB]" => "&euml;", //?
  "[\xE9]" => "&eacute;", //é
  "[\xBD]" => "&frac12;", //?
);
// note that you must use a two-digit hex code for whatever reason.
// So, for example, although the hex code for an em dash is 2014, you have to use 97 instead. ("[\x97]" => "&mdash;")

// separate the key->value array into two separate arrays. Or just make two arrays from the beginning, but it's easier to read this way.
foreach ($characters as $hex => $html) {
  $replaceThis[] = $hex;
  $replaceWith[] = $html;
}

$string = preg_replace($replaceThis, $replaceWith, $string);

?>

It may not be the most elegant solution, but it works and requires no knowledge of regular expressions.

它可能不是最优雅的解决方案,但它有效并且不需要正则表达式知识。

回答by question_about_the_problem

Especially when matching texts against each-other or against keywords, it is helpful to normalize the texts before. The following function removes all diacritics (marks like accents) from a given UTF8-encoded texts and returns ASCii-text.

尤其是在将文本与彼此或关键字进行匹配时,将之前的文本规范化会很有帮助。以下函数从给定的 UTF8 编码文本中删除所有变音符号(如重音符号)并返回 ASCii 文本。

Be sure to have the PHP-Normalizer-extension (intl and icu) installed.

确保安装了 PHP-Normalizer-extension(intl 和 icu)。

Tipp: You may also want to map the text to lower case before execute matching procedures ...

Tipp:您可能还想在执行匹配程序之前将文本映射为小写...

<?php

function normalizeUtf8String( $s)
{
    // Normalizer-class missing!
    if (! class_exists("Normalizer", $autoload = false))
        return $original_string;


    // maps German (umlauts) and other European characters onto two characters before just removing diacritics
    $s    = preg_replace( '@\x{00c4}@u'    , "AE",    $s );    // umlaut ? => AE
    $s    = preg_replace( '@\x{00d6}@u'    , "OE",    $s );    // umlaut ? => OE
    $s    = preg_replace( '@\x{00dc}@u'    , "UE",    $s );    // umlaut ü => UE
    $s    = preg_replace( '@\x{00e4}@u'    , "ae",    $s );    // umlaut ? => ae
    $s    = preg_replace( '@\x{00f6}@u'    , "oe",    $s );    // umlaut ? => oe
    $s    = preg_replace( '@\x{00fc}@u'    , "ue",    $s );    // umlaut ü => ue
    $s    = preg_replace( '@\x{00f1}@u'    , "ny",    $s );    // ? => ny
    $s    = preg_replace( '@\x{00ff}@u'    , "yu",    $s );    // ? => yu


    // maps special characters (characters with diacritics) on their base-character followed by the diacritical mark
        // exmaple:  ú => U′,  á => a`
    $s    = Normalizer::normalize( $s, Normalizer::FORM_D );


    $s    = preg_replace( '@\pM@u'        , "",    $s );    // removes diacritics


    $s    = preg_replace( '@\x{00df}@u'    , "ss",    $s );    // maps German ? onto ss
    $s    = preg_replace( '@\x{00c6}@u'    , "AE",    $s );    // ? => AE
    $s    = preg_replace( '@\x{00e6}@u'    , "ae",    $s );    // ? => ae
    $s    = preg_replace( '@\x{0132}@u'    , "IJ",    $s );    // ? => IJ
    $s    = preg_replace( '@\x{0133}@u'    , "ij",    $s );    // ? => ij
    $s    = preg_replace( '@\x{0152}@u'    , "OE",    $s );    // ? => OE
    $s    = preg_replace( '@\x{0153}@u'    , "oe",    $s );    // ? => oe

    $s    = preg_replace( '@\x{00d0}@u'    , "D",    $s );    // D => D
    $s    = preg_replace( '@\x{0110}@u'    , "D",    $s );    // D => D
    $s    = preg_replace( '@\x{00f0}@u'    , "d",    $s );    // e => d
    $s    = preg_replace( '@\x{0111}@u'    , "d",    $s );    // d => d
    $s    = preg_replace( '@\x{0126}@u'    , "H",    $s );    // H => H
    $s    = preg_replace( '@\x{0127}@u'    , "h",    $s );    // h => h
    $s    = preg_replace( '@\x{0131}@u'    , "i",    $s );    // i => i
    $s    = preg_replace( '@\x{0138}@u'    , "k",    $s );    // ? => k
    $s    = preg_replace( '@\x{013f}@u'    , "L",    $s );    // ? => L
    $s    = preg_replace( '@\x{0141}@u'    , "L",    $s );    // L => L
    $s    = preg_replace( '@\x{0140}@u'    , "l",    $s );    // ? => l
    $s    = preg_replace( '@\x{0142}@u'    , "l",    $s );    // l => l
    $s    = preg_replace( '@\x{014a}@u'    , "N",    $s );    // ? => N
    $s    = preg_replace( '@\x{0149}@u'    , "n",    $s );    // ? => n
    $s    = preg_replace( '@\x{014b}@u'    , "n",    $s );    // ? => n
    $s    = preg_replace( '@\x{00d8}@u'    , "O",    $s );    // ? => O
    $s    = preg_replace( '@\x{00f8}@u'    , "o",    $s );    // ? => o
    $s    = preg_replace( '@\x{017f}@u'    , "s",    $s );    // ? => s
    $s    = preg_replace( '@\x{00de}@u'    , "T",    $s );    // T => T
    $s    = preg_replace( '@\x{0166}@u'    , "T",    $s );    // T => T
    $s    = preg_replace( '@\x{00fe}@u'    , "t",    $s );    // t => t
    $s    = preg_replace( '@\x{0167}@u'    , "t",    $s );    // t => t

    // remove all non-ASCii characters
    $s    = preg_replace( '@[^
include('…');

echo preg_replace(
 '/(\P{L})/ui', // replace all except members of Unicode class "letters", case insensitive
 '', // with nothing
 I18N_UnicodeNormalizer::toNFKD('?é??ù?é??ù') // ù → u + `
);
-\x80]@u' , "", $s ); // possible errors in UTF8-regular-expressions if (empty($s)) return $original_string; else return $s; } ?>

The above function is mainly based on the following article: http://ahinea.com/en/tech/accented-translate.html

以上功能主要基于以下文章:http: //ahinea.com/en/tech/accented-translate.html

回答by eleg

use PEAR I18N_UnicodeNormalizer-1.0.0

使用PEAR I18N_UnicodeNormalizer-1.0.0

##代码##

→ AEIOUaeiou

→ AEIOUaeiou

回答by Pascal MARTIN

People often use str_replaceor strtrand a big list of character to convert "from" and "to" -- even if that doesn't look quite pretty...

人们经常使用str_replaceorstrtr和一大串字符来转换“from”和“to”——即使这看起来不太漂亮......

Another solution, I suppose, might be using something like iconvwith the option //TRANSLIT-- but doesn't always work, from what I remember...

我想,另一个解决方案可能是使用类似iconv选项的东西//TRANSLIT——但并不总是有效,据我所知......

Also, if you are using PHP 5.3, the new Normalizerclass might be interesting ;-)

另外,如果您使用的是 PHP 5.3,新Normalizer类可能会很有趣 ;-)