如何从 PHP 字符串中的字符中删除重音符号？

Question

提问by georgebrock

I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.

我试图从 PHP 字符串中的字符中删除重音符号，这是使字符串在 URL 中可用的第一步。

I'm using the following code:

我正在使用以下代码：

$input = "Fó? B?r";

setlocale(LC_ALL, "en_US.utf8");
$output = iconv("utf-8", "ascii//TRANSLIT", $input);

print($output);

The output I would expect would be something like this:

我期望的输出是这样的：

F'oo Bar

However, instead of the accented characters being transliterated they are replaced with question marks:

但是，重音字符不是被音译，而是被问号替换：

F?? B?r

Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:

我在网上能找到的一切都表明设置语言环境可以解决这个问题，但是我已经在这样做了。我已经检查了以下详细信息：

The locale I am setting is supported by the server (included in the list produced by locale -a)
The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l)
The input string is UTF-8 encoded (verified using PHP's mb_check_encodingfunction, as suggested in the answer by mercator)
The call to setlocaleis successful (it returns 'en_US.utf8'rather than FALSE)

服务器支持我设置的语言环境（包含在由生成的列表中locale -a）
服务器版本的 iconv 支持源和目标编码（UTF-8 和 ASCII）（包含在由生成的列表中iconv -l）
输入字符串是 UTF-8 编码的（使用 PHP 的mb_check_encoding函数验证，如mercator的答案中所建议的）
调用setlocale成功（它返回'en_US.utf8'而不是FALSE）

The cause of the problem:

问题原因：

The server is using the wrong implementation of iconv. It has the glibcversion instead of the required libiconvversion.

服务器使用了错误的 iconv 实现。它具有glibc版本而不是所需的libiconv版本。

Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.
– PHP manual's introduction to iconv

请注意，某些系统上的 iconv 功能可能无法按预期工作。在这种情况下，最好安装 GNU libiconv 库。它很可能最终会得到更一致的结果。
– PHP 手册对 iconv 的介绍

Details about the iconv implementation that is used by PHP are included in the output of the phpinfofunction.

PHP 使用的 iconv 实现的详细信息包含在phpinfo函数的输出中。

(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)

（我无法在我正在为该项目使用的服务器上使用正确的 iconv 库重新编译 PHP，因此我在下面接受的答案是在没有 iconv 支持的情况下删除重音最有用的答案。）

Answer 1

采纳答案by Jeremy Smyth

I think the problem here is that your encodings consider ? and ? different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

我认为这里的问题是您的编码考虑？和？'a' 的不同符号。事实上，strtr 的 PHP 文档提供了一个以丑陋方式去除重音的示例:(

http://ie2.php.net/strtr

Answer 2

回答by dynamic

What about WordPress implementation?

WordPress 实施怎么样？

function remove_accents($string) {
    if ( !preg_match('/[\x80-\xff]/', $string) )
        return $string;

    $chars = array(
    // Decompositions for Latin-1 Supplement
    chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
    chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
    chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
    chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
    chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
    chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
    chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
    chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
    chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
    chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
    chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
    chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
    chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
    chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
    chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
    chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
    chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
    chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
    chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
    chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
    chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
    chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
    chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
    chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
    chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
    chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
    chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
    chr(195).chr(191) => 'y',
    // Decompositions for Latin Extended-A
    chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
    chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
    chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
    chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
    chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
    chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
    chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
    chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
    chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
    chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
    chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
    chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
    chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
    chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
    chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
    chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
    chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
    chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
    chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
    chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
    chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
    chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
    chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
    chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
    chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
    chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
    chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
    chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
    chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
    chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
    chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
    chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
    chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
    chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
    chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
    chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
    chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
    chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
    chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
    chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
    chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
    chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
    chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
    chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
    chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
    chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
    chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
    chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
    chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
    chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
    chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
    chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
    chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
    chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
    chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
    chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
    chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
    chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
    chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
    chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
    chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
    chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
    chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
    chr(197).chr(190) => 'z', chr(197).chr(191) => 's'
    );

    $string = strtr($string, $chars);

    return $string;
}

To better understand what this function does, check this corresponding conversion table here:

为了更好地理解此函数的作用，请在此处查看相应的转换表：

à => A
á => A
? => A
? => A
? => A
? => A
? => C
è => E
é => E
ê => E
? => E
ì => I
í => I
? => I
? => I
? => N
ò => O
ó => O
? => O
? => O
? => O
ù => U
ú => U
? => U
ü => U
Y => Y
? => s
à => a
á => a
a => a
? => a
? => a
? => a
? => c
è => e
é => e
ê => e
? => e
ì => i
í => i
? => i
? => i
? => n
ò => o
ó => o
? => o
? => o
? => o
ù => u
ú => u
? => u
ü => u
y => y
? => y
ā => A
ā => a
? => A
? => a
? => A
? => a
? => C
? => c
? => C
? => c
? => C
? => c
? => C
? => c
? => D
? => d
? => D
? => d
ē => E
ē => e
? => E
? => e
? => E
? => e
? => E
? => e
ě => E
ě => e
? => G
? => g
? => G
? => g
? => G
? => g
? => G
? => g
? => H
? => h
? => H
? => h
? => I
? => i
ī => I
ī => i
? => I
? => i
? => I
? => i
? => I
? => i
? => IJ
? => ij
? => J
? => j
? => K
? => k
? => k
? => L
? => l
? => L
? => l
? => L
? => l
? => L
? => l
? => L
? => l
? => N
ń => n
? => N
? => n
? => N
ň => n
? => N
? => n
? => N
ō => O
ō => o
? => O
? => o
? => O
? => o
? => OE
? => oe
? => R
? => r
? => R
? => r
? => R
? => r
? => S
? => s
? => S
? => s
? => S
? => s
? => S
? => s
? => T
? => t
? => T
? => t
? => T
? => t
? => U
? => u
ū => U
ū => u
? => U
? => u
? => U
? => u
? => U
? => u
? => U
? => u
? => W
? => w
? => Y
? => y
? => Y
? => Z
? => z
? => Z
? => z
? => Z
? => z
? => s

You can generate this convesion table yourself by simply iterarting over the $charsarray of the function:

您可以通过简单地迭代$chars函数数组来自己生成这个转换表：

foreach($chars as $k=>$v) {
   printf("%s -> %s", $k, $v);
}

Answer 3

回答by Gino

This is a piece of code I found and use often:

这是我发现并经常使用的一段代码：

function stripAccents($stripAccents){
  return strtr($stripAccents,'àáa???èéê?ìí???òó???ùú?üy?àá????èéê?ìí???òó???ùú?üY','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}

Answer 4

回答by Cazuma Nii Cavalcanti

UTF-8 friendly version of the simple function posted above by Gino:

Gino 上面发布的简单函数的 UTF-8 友好版本：

function stripAccents($str) {
    return strtr(utf8_decode($str), utf8_decode('àáa???èéê?ìí???òó???ùú?üy?àá????èéê?ìí???òó???ùú?üY'), 'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}

Had to come to this because my php document was UTF-8 encoded.

不得不这样做，因为我的 php 文档是 UTF-8 编码的。

Hope it helps.

希望能帮助到你。

Answer 5

回答by gabo

if you have http://php.net/manual/en/book.intl.phpavailable, this solved your problem

如果你有http://php.net/manual/en/book.intl.php可用，这解决了你的问题

$string = "Fó? B?r";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);

Answer 6

回答by langpavel

When using iconv, the parameter locale must be set:

使用时iconv，必须设置参数 locale：

function test_enc($text = 'ě????yáíé ě????Yáíé fó? b?r Fó? B?R ?')
{
    echo '<tt>';
    echo iconv('utf8', 'ascii//TRANSLIT', $text);
    echo '</tt><br/>';
} 

test_enc();
setlocale(LC_ALL, 'cs_CZ.utf8');
test_enc();
setlocale(LC_ALL, 'en_US.utf8');
test_enc();

Yields into:

产生：

????????? ????????? f?? b?r F?? B?R ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae

Another locales then cs_CZ and en_US I haven't installed and I can't test it.

另一个语言环境，然后是 cs_CZ 和 en_US 我还没有安装，我无法测试它。

In C# I see solution using translation to unicode normalized form - accents are splitted out and then filtered via nonspacing unicode category.

在 C# 中，我看到使用翻译到 unicode 规范化形式的解决方案 - 重音被拆分，然后通过非间距 unicode 类别过滤。

Answer 7

回答by Waiyl Karim

The easiest way is to use iconv()PHP native function.

最简单的方法是使用iconv()PHP 原生函数。

 echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', "Th?s ?s à vêry wrong séntènce!");

 // output: This is a very wrong sentence!

Answer 8

回答by Junior Mayhé

Indeed is a matter of taste. There are many flavors for converting such letters.

确实是品味问题。转换此类字母有多种方式。

function replaceAccents($str)
{
  $a = array('à', 'á', '?', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', 'D', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'Y', '?', 'à', 'á', 'a', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'y', '?', 'ā', 'ā', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ē', 'ē', '?', '?', '?', '?', '?', '?', 'ě', 'ě', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ī', 'ī', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ń', '?', '?', '?', 'ň', '?', 'ō', 'ō', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ū', 'ū', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ǎ', 'ǎ', 'ǐ', 'ǐ', 'ǒ', 'ǒ', 'ǔ', 'ǔ', 'ǖ', 'ǖ', 'ǘ', 'ǘ', 'ǚ', 'ǚ', 'ǜ', 'ǜ', '?', '?', '?', '?', '?', '?');
  $b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
  return str_replace($a, $b, $str);
}

Answer 9

回答by Xetius

You could use urlencode. Does not quite do what you want (remove accents), but will give you a url usable string

你可以使用 urlencode。不完全按照您的要求（删除重音符号），但会给您一个 url 可用字符串

$output = urlencode ($input);

In Perl I could use a translate regex, but I cannot think of the PHP equivalent

在 Perl 中，我可以使用翻译正则表达式，但我想不出 PHP 等效项

$input =~ tr/áaà?/aaaa/;

etc...

等等...

you could do this using preg_replace

你可以使用 preg_replace 做到这一点

$patterns[0] = '/[á|a|à|?|?]/';
$patterns[1] = '/[e|é|ê|è|?]/';
$patterns[2] = '/[í|?|ì|?]/';
$patterns[3] = '/[ó|?|ò|?|?|?]/';
$patterns[4] = '/[ú|?|ù|ü]/';
$patterns[5] = '/?/';
$patterns[6] = '/?/';
$patterns[7] = '/?/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';

$output = preg_replace($patterns, $replacements, $input);

(Please note this was typed from a foggy beer ridden Friday after noon memory, so may not be 100% correct)

（请注意，这是在星期五中午记忆后骑着雾蒙蒙的啤酒打出来的，所以可能不是 100% 正确）

or you could make a hash table and do a replacement based off of that.

或者您可以制作一个哈希表并根据它进行替换。

Answer 10

回答by Mimouni

here is a simple function that i use usually to remove accents :

这是我通常用来删除重音的简单函数：

function str_without_accents($str, $charset='utf-8')
{
    $str = htmlentities($str, ENT_NOQUOTES, $charset);

    $str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '', $str);
    $str = preg_replace('#&([A-za-z]{2})(?:lig);#', '', $str); // pour les ligatures e.g. '&oelig;'
    $str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères

    return $str;   // or add this : mb_strtoupper($str); for uppercase :)
}

如何从 PHP 字符串中的字符中删除重音符号？

提问by georgebrock

The cause of the problem:

问题原因：

采纳答案by Jeremy Smyth

回答by dynamic

回答by Gino

回答by Cazuma Nii Cavalcanti

回答by gabo

回答by langpavel

回答by Waiyl Karim

回答by Junior Mayhé

回答by Xetius

回答by Mimouni

相关推荐

最近更新

标签

如何从 PHP 字符串中的字符中删除重音符号？

提问by georgebrock

The cause of the problem:

问题原因：

采纳答案by Jeremy Smyth

回答by dynamic

回答by Gino

回答by Cazuma Nii Cavalcanti

回答by gabo

回答by langpavel

回答by Waiyl Karim

回答by Junior Mayhé

回答by Xetius

回答by Mimouni

相关推荐

为什么 PHP 5.2+ 不允许抽象静态类方法？

php 使用 PHPWord 自动下载文件附件

如何使用 Microsoft AD 为内部 PHP 应用程序实现单点登录 (SSO)？

Mail.php 和 Smtp 身份验证问题

相关推荐

最近更新

标签