PHP 替换特殊字符,如 à->a、è->e

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10152894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 21:30:55  来源:igfitidea点击:

PHP replacing special characters like à->a, è->e

phputf-8preg-replacedecode

提问by Zoran ?uki?

I have php document signup.php which save the content from form (in form.php document) to MySQL base. The problem arises when I want to reformat the input content. I want do decode UTF-8 charachters like à->a.

我有 php 文档 signup.php,它将表单(在 form.php 文档中)的内容保存到 MySQL 基础。当我想重新格式化输入内容时会出现问题。我想解码像à->a这样的UTF-8字符。

  $first_name=$_POST['first_name'];
  $last_name=$_POST['last_name'];
  $course=$_POST['course'];

  $chain="prêt-à-porter";

$pattern = array("'é'", "'è'", "'?'", "'ê'", "'é'", "'è'", "'?'", "'ê'", "'á'", "'à'", "'?'", "'a'", "'?'", "'á'", "'à'", "'?'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'y'", "'?'", "'Y'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'");

$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C'); 

$chain = preg_replace($pattern, $replace, $chain);

echo $chain; // print pret-a-porter

$first_name =  preg_replace($pattern, $replace, $first_name);

echo $first_name; // does not change the input!?!

Why it works perfectly for $chain, but for $first_name or $last_name doesnt work?

为什么它对 $chain 非常有效,但对 $first_name 或 $last_name 不起作用?

Also i try

我也试试

echo $first_name; // print áááááábééééééb????
$trans = array("á" => "a", "é" => "e", "?" => "s");
echo strtr("áááááábééééééb????", $trans); // print aaaaaabeeeeeebssss
echo strtr($first_name,$trans);  // print áááááábééééééb????

but the problem, as you can see, is same!

但正如你所看到的,问题是一样的!

回答by dmp

There's a much easier way to do this, using iconv- from the user notes, this seems to be what you want to do: characters transliteration

有一个更简单的方法来做到这一点,使用iconv- 从用户注释,这似乎是你想要做的:字符音译

// PHP.net User notes
<?php
    $string = "?ABBāSāBāD";

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $string);
    // output: [nothing, and you get a notice]

    echo iconv('UTF-8', 'ISO-8859-1//IGNORE', $string);
    // output: ABBSBD

    echo iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $string);
    // output: ABBASABAD
    // Yay! That's what I wanted!
?>

Be very conscientiouswith your character encodings, so you are keeping the same encoding at all stages in the process - front end, form submission, encoding of the source files. Default encoding in PHP and in forms is ISO-8859-1, before PHP 5.4 where it changed to be UTF8 (finally!).

很认真的跟你的字符编码,所以你保持相同的编码在过程中的各个阶段-前端,表单提交,源文件编码。PHP 和表单中的默认编码是 ISO-8859-1,在 PHP 5.4 之前,它更改为 UTF8(终于!)。

There's a couple of functions you can play around with for ideas. First is from CakePHP's inflector class, called slug:

您可以使用几个函数来获取想法。首先是来自 CakePHP 的 inflector 类,称为slug

public static function slug($string, $replacement = '_') {
    $quotedReplacement = preg_quote($replacement, '/');

    $merge = array(
        '/[^\s\p{Ll}\p{Lm}\p{Lo}\p{Lt}\p{Lu}\p{Nd}]/mu' => ' ',
        '/\s+/' => $replacement,
        sprintf('/^[%s]+|[%s]+$/', $quotedReplacement, $quotedReplacement) => '',
    );

    $map = self::$_transliteration + $merge;
    return preg_replace(array_keys($map), array_values($map), $string);
}

It depends on a self::$_transliterationarray which is similar to what you were doing in your question - you can see the source for inflector on github.

这取决于一个self::$_transliteration与您在问题中所做的类似的数组 - 您可以在 github 上看到 inflector 的源代码

Another is a function I use personally, which comes from here.

另一个是我个人使用的一个函数,来自这里

function slugify($text,$strict = false) {
    $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
    // replace non letter or digits by -
    $text = preg_replace('~[^\pL\d.]+~u', '-', $text);

    // trim
    $text = trim($text, '-');
    setlocale(LC_CTYPE, 'en_GB.utf8');
    // transliterate
    if (function_exists('iconv')) {
        $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }

    // lowercase
    $text = strtolower($text);
    // remove unwanted characters
    $text = preg_replace('~[^-\w.]+~', '', $text);
    if (empty($text)) {
        return 'empty_$';
    }
    if ($strict) {
        $text = str_replace(".", "_", $text);
    }
    return $text;
}

What those functions do is transliterate and create 'slugs' from arbitrary text input, which is a very very useful thing to have in your toolchest when making web apps. Hope this helps!

这些函数的作用是从任意文本输入中音译和创建“ slugs”,这是制作网络应用程序时工具箱中非常有用的东西。希望这可以帮助!

回答by Dieter Gribnitz

Here is a way to have some flexibility in what should be discarded and what should be replaced. This is how I currently do it.

这是一种在应该丢弃什么和应该替换什么方面具有一定灵活性的方法。这就是我目前的做法。

$string = 'à some string with junk ? ? ';

$string = 'à 一些带有垃圾的字符串?? ';

$replace = [
    '&lt;' => '', '&gt;' => '', '&#039;' => '', '&amp;' => '',
    '&quot;' => '', 'à' => 'A', 'á' => 'A', '?' => 'A', '?' => 'A', '?' => 'Ae',
    '&Auml;' => 'A', '?' => 'A', 'ā' => 'A', '?' => 'A', '?' => 'A', '?' => 'Ae',
    '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'C', '?' => 'D', '?' => 'D',
    'D' => 'D', 'è' => 'E', 'é' => 'E', 'ê' => 'E', '?' => 'E', 'ē' => 'E',
    '?' => 'E', 'ě' => 'E', '?' => 'E', '?' => 'E', '?' => 'G', '?' => 'G',
    '?' => 'G', '?' => 'G', '?' => 'H', '?' => 'H', 'ì' => 'I', 'í' => 'I',
    '?' => 'I', '?' => 'I', 'ī' => 'I', '?' => 'I', '?' => 'I', '?' => 'I',
    '?' => 'I', '?' => 'IJ', '?' => 'J', '?' => 'K', '?' => 'K', '?' => 'K',
    '?' => 'K', '?' => 'K', '?' => 'K', '?' => 'N', '?' => 'N', '?' => 'N',
    '?' => 'N', '?' => 'N', 'ò' => 'O', 'ó' => 'O', '?' => 'O', '?' => 'O',
    '?' => 'Oe', '&Ouml;' => 'Oe', '?' => 'O', 'ō' => 'O', '?' => 'O', '?' => 'O',
    '?' => 'OE', '?' => 'R', '?' => 'R', '?' => 'R', '?' => 'S', '?' => 'S',
    '?' => 'S', '?' => 'S', '?' => 'S', '?' => 'T', '?' => 'T', '?' => 'T',
    '?' => 'T', 'ù' => 'U', 'ú' => 'U', '?' => 'U', 'ü' => 'Ue', 'ū' => 'U',
    '&Uuml;' => 'Ue', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U', '?' => 'U',
    '?' => 'W', 'Y' => 'Y', '?' => 'Y', '?' => 'Y', '?' => 'Z', '?' => 'Z',
    '?' => 'Z', 'T' => 'T', 'à' => 'a', 'á' => 'a', 'a' => 'a', '?' => 'a',
    '?' => 'ae', '&auml;' => 'ae', '?' => 'a', 'ā' => 'a', '?' => 'a', '?' => 'a',
    '?' => 'ae', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c', '?' => 'c',
    '?' => 'd', '?' => 'd', 'e' => 'd', 'è' => 'e', 'é' => 'e', 'ê' => 'e',
    '?' => 'e', 'ē' => 'e', '?' => 'e', 'ě' => 'e', '?' => 'e', '?' => 'e',
    '?' => 'f', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'g', '?' => 'h',
    '?' => 'h', 'ì' => 'i', 'í' => 'i', '?' => 'i', '?' => 'i', 'ī' => 'i',
    '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'i', '?' => 'ij', '?' => 'j',
    '?' => 'k', '?' => 'k', '?' => 'l', '?' => 'l', '?' => 'l', '?' => 'l',
    '?' => 'l', '?' => 'n', 'ń' => 'n', 'ň' => 'n', '?' => 'n', '?' => 'n',
    '?' => 'n', 'ò' => 'o', 'ó' => 'o', '?' => 'o', '?' => 'o', '?' => 'oe',
    '&ouml;' => 'oe', '?' => 'o', 'ō' => 'o', '?' => 'o', '?' => 'o', '?' => 'oe',
    '?' => 'r', '?' => 'r', '?' => 'r', '?' => 's', 'ù' => 'u', 'ú' => 'u',
    '?' => 'u', 'ü' => 'ue', 'ū' => 'u', '&uuml;' => 'ue', '?' => 'u', '?' => 'u',
    '?' => 'u', '?' => 'u', '?' => 'u', '?' => 'w', 'y' => 'y', '?' => 'y',
    '?' => 'y', '?' => 'z', '?' => 'z', '?' => 'z', 't' => 't', '?' => 'ss',
    '?' => 'ss', 'ый' => 'iy', 'А' => 'A', 'Б' => 'B', 'В' => 'V', 'Г' => 'G',
    'Д' => 'D', 'Е' => 'E', 'Ё' => 'YO', 'Ж' => 'ZH', 'З' => 'Z', 'И' => 'I',
    'Й' => 'Y', 'К' => 'K', 'Л' => 'L', 'М' => 'M', 'Н' => 'N', 'О' => 'O',
    'П' => 'P', 'Р' => 'R', 'С' => 'S', 'Т' => 'T', 'У' => 'U', 'Ф' => 'F',
    'Х' => 'H', 'Ц' => 'C', 'Ч' => 'CH', 'Ш' => 'SH', 'Щ' => 'SCH', 'Ъ' => '',
    'Ы' => 'Y', 'Ь' => '', 'Э' => 'E', 'Ю' => 'YU', 'Я' => 'YA', 'а' => 'a',
    'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 'ё' => 'yo',
    'ж' => 'zh', 'з' => 'z', 'и' => 'i', 'й' => 'y', 'к' => 'k', 'л' => 'l',
    'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's',
    'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch',
    'ш' => 'sh', 'щ' => 'sch', 'ъ' => '', 'ы' => 'y', 'ь' => '', 'э' => 'e',
    'ю' => 'yu', 'я' => 'ya'
];

echo str_replace(array_keys($replace), $replace, $string);  

回答by woodscreative

As of PHP >= 5.4.0

自 PHP >= 5.4.0

$translatedString = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $string);

回答by Peter Bagnall

The string $chain is in the same character encoding as the characters in the array - it's possible, even likely, that the $first_name string is in a different encoding, and so those characters don't match. You might want to try using the multibyte string functions instead.

字符串 $chain 与数组中的字符采用相同的字符编码 - 甚至很有可能 $first_name 字符串采用不同的编码,因此这些字符不匹配。您可能想尝试改用多字节字符串函数。

Try mb_convert_encoding. You might also want to try using HTML_ENTITIES as the to_encoding parameter, then you don't need to worry about how the characters will get converted - it will be very predictable.

尝试 mb_convert_encoding。您可能还想尝试使用 HTML_ENTITIES 作为 to_encoding 参数,然后您无需担心字符将如何转换 - 这将是非常可预测的。

Assuming your input to this script is in UTF-8, probably not a bad place to start...

假设您对此脚本的输入采用 UTF-8 格式,这可能是一个不错的起点...

$first_name = mb_convert_encoding($first_name, "HTML-ENTITIES", "UTF-8"); 

回答by ChickenFeet

Wish I found this thread sooner. The function I made (that took me way too long) is below:

希望我早点找到这个线程。我制作的功能(花了我太长时间)如下:

function CheckLetters($field){
    $letters = [
        0 => "a à á a ? ? ? ? ā",
        1 => "c ? ? ?",
        2 => "e é è ê ? ? ? ē",
        3 => "i ī ? í ì ? ?",
        4 => "l ?",
        5 => "n ? ń",
        6 => "o ō ? ? ? ó ò ? ?",
        7 => "s ? ? ?",
        8 => "u ū ú ù ü ?",
        9 => "w ?",
        10 => "y ? ?",
        11 => "z ? ? ?",
    ];
    foreach ($letters as &$values){
        $newValue = substr($values, 0, 1);
        $values = substr($values, 2, strlen($values));
        $values = explode(" ", $values);
        foreach ($values as &$oldValue){
            while (strpos($field,$oldValue) !== false){
                $field = preg_replace("/" . $oldValue . '/', $newValue, $field, 1);
            }
        }
    }
    return $field;
}

回答by alex iancu

CodeIgniter way:

CodeIgniter 方式:

$this->load->helper('text');

$string = convert_accented_characters($string);

This function uses a companion config file application/config/foreign_chars.phpto define the to and from array for transliteration.

此函数使用配套的配置文件application/config/foreign_chars.php来定义转写的 to 和 from 数组。

https://www.codeigniter.com/user_guide/helpers/text_helper.html#ascii_to_entities

https://www.codeigniter.com/user_guide/helpers/text_helper.html#ascii_to_entities

回答by Jonas Elan

Simple function. Transform strings like 'áb? éfg' to 'abc_efg'

简单的功能。转换像 'áb 这样的字符串?éfg' 到 'abc_efg'

/**
 * @param $str
 * @return mixed
 */
function sanitizeString($str) {
    $str = preg_replace('/[áà?a?]/ui', 'a', $str);
    $str = preg_replace('/[éèê?]/ui', 'e', $str);
    $str = preg_replace('/[íì??]/ui', 'i', $str);
    $str = preg_replace('/[óò???]/ui', 'o', $str);
    $str = preg_replace('/[úù?ü]/ui', 'u', $str);
    $str = preg_replace('/[?]/ui', 'c', $str);
    $str = preg_replace('/[^a-z0-9]/i', '_', $str);
    $str = preg_replace('/_+/', '_', $str);

    return $str;
}