如何从 PHP 字符串中的字符中删除重音符号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1017599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I remove accents from characters in a PHP string?
提问by georgebrock
I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.
我试图从 PHP 字符串中的字符中删除重音符号,这是使字符串在 URL 中可用的第一步。
I'm using the following code:
我正在使用以下代码:
$input = "Fó? B?r";
setlocale(LC_ALL, "en_US.utf8");
$output = iconv("utf-8", "ascii//TRANSLIT", $input);
print($output);
The output I would expect would be something like this:
我期望的输出是这样的:
F'oo Bar
However, instead of the accented characters being transliterated they are replaced with question marks:
但是,重音字符不是被音译,而是被问号替换:
F?? B?r
Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:
我在网上能找到的一切都表明设置语言环境可以解决这个问题,但是我已经在这样做了。我已经检查了以下详细信息:
- The locale I am setting is supported by the server (included in the list produced by
locale -a) - The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by
iconv -l) - The input string is UTF-8 encoded (verified using PHP's
mb_check_encodingfunction, as suggested in the answer by mercator) - The call to
setlocaleis successful (it returns'en_US.utf8'rather thanFALSE)
- 服务器支持我设置的语言环境(包含在由 生成的列表中
locale -a) - 服务器版本的 iconv 支持源和目标编码(UTF-8 和 ASCII)(包含在由 生成的列表中
iconv -l) - 输入字符串是 UTF-8 编码的(使用 PHP 的
mb_check_encoding函数验证,如mercator的答案中所建议的) - 调用
setlocale成功(它返回'en_US.utf8'而不是FALSE)
The cause of the problem:
问题原因:
The server is using the wrong implementation of iconv. It has the glibcversion instead of the required libiconvversion.
服务器使用了错误的 iconv 实现。它具有glibc版本而不是所需的libiconv版本。
Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.
– PHP manual's introduction to iconv
请注意,某些系统上的 iconv 功能可能无法按预期工作。在这种情况下,最好安装 GNU libiconv 库。它很可能最终会得到更一致的结果。
– PHP 手册对 iconv 的介绍
Details about the iconv implementation that is used by PHP are included in the output of the phpinfofunction.
PHP 使用的 iconv 实现的详细信息包含在phpinfo函数的输出中。
(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)
(我无法在我正在为该项目使用的服务器上使用正确的 iconv 库重新编译 PHP,因此我在下面接受的答案是在没有 iconv 支持的情况下删除重音最有用的答案。)
采纳答案by Jeremy Smyth
I think the problem here is that your encodings consider ? and ? different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(
我认为这里的问题是您的编码考虑?和 ?'a' 的不同符号。事实上,strtr 的 PHP 文档提供了一个以丑陋方式去除重音的示例:(
回答by dynamic
What about WordPress implementation?
WordPress 实施怎么样?
function remove_accents($string) {
if ( !preg_match('/[\x80-\xff]/', $string) )
return $string;
$chars = array(
// Decompositions for Latin-1 Supplement
chr(195).chr(128) => 'A', chr(195).chr(129) => 'A',
chr(195).chr(130) => 'A', chr(195).chr(131) => 'A',
chr(195).chr(132) => 'A', chr(195).chr(133) => 'A',
chr(195).chr(135) => 'C', chr(195).chr(136) => 'E',
chr(195).chr(137) => 'E', chr(195).chr(138) => 'E',
chr(195).chr(139) => 'E', chr(195).chr(140) => 'I',
chr(195).chr(141) => 'I', chr(195).chr(142) => 'I',
chr(195).chr(143) => 'I', chr(195).chr(145) => 'N',
chr(195).chr(146) => 'O', chr(195).chr(147) => 'O',
chr(195).chr(148) => 'O', chr(195).chr(149) => 'O',
chr(195).chr(150) => 'O', chr(195).chr(153) => 'U',
chr(195).chr(154) => 'U', chr(195).chr(155) => 'U',
chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y',
chr(195).chr(159) => 's', chr(195).chr(160) => 'a',
chr(195).chr(161) => 'a', chr(195).chr(162) => 'a',
chr(195).chr(163) => 'a', chr(195).chr(164) => 'a',
chr(195).chr(165) => 'a', chr(195).chr(167) => 'c',
chr(195).chr(168) => 'e', chr(195).chr(169) => 'e',
chr(195).chr(170) => 'e', chr(195).chr(171) => 'e',
chr(195).chr(172) => 'i', chr(195).chr(173) => 'i',
chr(195).chr(174) => 'i', chr(195).chr(175) => 'i',
chr(195).chr(177) => 'n', chr(195).chr(178) => 'o',
chr(195).chr(179) => 'o', chr(195).chr(180) => 'o',
chr(195).chr(181) => 'o', chr(195).chr(182) => 'o',
chr(195).chr(182) => 'o', chr(195).chr(185) => 'u',
chr(195).chr(186) => 'u', chr(195).chr(187) => 'u',
chr(195).chr(188) => 'u', chr(195).chr(189) => 'y',
chr(195).chr(191) => 'y',
// Decompositions for Latin Extended-A
chr(196).chr(128) => 'A', chr(196).chr(129) => 'a',
chr(196).chr(130) => 'A', chr(196).chr(131) => 'a',
chr(196).chr(132) => 'A', chr(196).chr(133) => 'a',
chr(196).chr(134) => 'C', chr(196).chr(135) => 'c',
chr(196).chr(136) => 'C', chr(196).chr(137) => 'c',
chr(196).chr(138) => 'C', chr(196).chr(139) => 'c',
chr(196).chr(140) => 'C', chr(196).chr(141) => 'c',
chr(196).chr(142) => 'D', chr(196).chr(143) => 'd',
chr(196).chr(144) => 'D', chr(196).chr(145) => 'd',
chr(196).chr(146) => 'E', chr(196).chr(147) => 'e',
chr(196).chr(148) => 'E', chr(196).chr(149) => 'e',
chr(196).chr(150) => 'E', chr(196).chr(151) => 'e',
chr(196).chr(152) => 'E', chr(196).chr(153) => 'e',
chr(196).chr(154) => 'E', chr(196).chr(155) => 'e',
chr(196).chr(156) => 'G', chr(196).chr(157) => 'g',
chr(196).chr(158) => 'G', chr(196).chr(159) => 'g',
chr(196).chr(160) => 'G', chr(196).chr(161) => 'g',
chr(196).chr(162) => 'G', chr(196).chr(163) => 'g',
chr(196).chr(164) => 'H', chr(196).chr(165) => 'h',
chr(196).chr(166) => 'H', chr(196).chr(167) => 'h',
chr(196).chr(168) => 'I', chr(196).chr(169) => 'i',
chr(196).chr(170) => 'I', chr(196).chr(171) => 'i',
chr(196).chr(172) => 'I', chr(196).chr(173) => 'i',
chr(196).chr(174) => 'I', chr(196).chr(175) => 'i',
chr(196).chr(176) => 'I', chr(196).chr(177) => 'i',
chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij',
chr(196).chr(180) => 'J', chr(196).chr(181) => 'j',
chr(196).chr(182) => 'K', chr(196).chr(183) => 'k',
chr(196).chr(184) => 'k', chr(196).chr(185) => 'L',
chr(196).chr(186) => 'l', chr(196).chr(187) => 'L',
chr(196).chr(188) => 'l', chr(196).chr(189) => 'L',
chr(196).chr(190) => 'l', chr(196).chr(191) => 'L',
chr(197).chr(128) => 'l', chr(197).chr(129) => 'L',
chr(197).chr(130) => 'l', chr(197).chr(131) => 'N',
chr(197).chr(132) => 'n', chr(197).chr(133) => 'N',
chr(197).chr(134) => 'n', chr(197).chr(135) => 'N',
chr(197).chr(136) => 'n', chr(197).chr(137) => 'N',
chr(197).chr(138) => 'n', chr(197).chr(139) => 'N',
chr(197).chr(140) => 'O', chr(197).chr(141) => 'o',
chr(197).chr(142) => 'O', chr(197).chr(143) => 'o',
chr(197).chr(144) => 'O', chr(197).chr(145) => 'o',
chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe',
chr(197).chr(148) => 'R',chr(197).chr(149) => 'r',
chr(197).chr(150) => 'R',chr(197).chr(151) => 'r',
chr(197).chr(152) => 'R',chr(197).chr(153) => 'r',
chr(197).chr(154) => 'S',chr(197).chr(155) => 's',
chr(197).chr(156) => 'S',chr(197).chr(157) => 's',
chr(197).chr(158) => 'S',chr(197).chr(159) => 's',
chr(197).chr(160) => 'S', chr(197).chr(161) => 's',
chr(197).chr(162) => 'T', chr(197).chr(163) => 't',
chr(197).chr(164) => 'T', chr(197).chr(165) => 't',
chr(197).chr(166) => 'T', chr(197).chr(167) => 't',
chr(197).chr(168) => 'U', chr(197).chr(169) => 'u',
chr(197).chr(170) => 'U', chr(197).chr(171) => 'u',
chr(197).chr(172) => 'U', chr(197).chr(173) => 'u',
chr(197).chr(174) => 'U', chr(197).chr(175) => 'u',
chr(197).chr(176) => 'U', chr(197).chr(177) => 'u',
chr(197).chr(178) => 'U', chr(197).chr(179) => 'u',
chr(197).chr(180) => 'W', chr(197).chr(181) => 'w',
chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y',
chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z',
chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z',
chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z',
chr(197).chr(190) => 'z', chr(197).chr(191) => 's'
);
$string = strtr($string, $chars);
return $string;
}
To better understand what this function does, check this corresponding conversion table here:
为了更好地理解此函数的作用,请在此处查看相应的转换表:
à => A
á => A
? => A
? => A
? => A
? => A
? => C
è => E
é => E
ê => E
? => E
ì => I
í => I
? => I
? => I
? => N
ò => O
ó => O
? => O
? => O
? => O
ù => U
ú => U
? => U
ü => U
Y => Y
? => s
à => a
á => a
a => a
? => a
? => a
? => a
? => c
è => e
é => e
ê => e
? => e
ì => i
í => i
? => i
? => i
? => n
ò => o
ó => o
? => o
? => o
? => o
ù => u
ú => u
? => u
ü => u
y => y
? => y
ā => A
ā => a
? => A
? => a
? => A
? => a
? => C
? => c
? => C
? => c
? => C
? => c
? => C
? => c
? => D
? => d
? => D
? => d
ē => E
ē => e
? => E
? => e
? => E
? => e
? => E
? => e
ě => E
ě => e
? => G
? => g
? => G
? => g
? => G
? => g
? => G
? => g
? => H
? => h
? => H
? => h
? => I
? => i
ī => I
ī => i
? => I
? => i
? => I
? => i
? => I
? => i
? => IJ
? => ij
? => J
? => j
? => K
? => k
? => k
? => L
? => l
? => L
? => l
? => L
? => l
? => L
? => l
? => L
? => l
? => N
ń => n
? => N
? => n
? => N
ň => n
? => N
? => n
? => N
ō => O
ō => o
? => O
? => o
? => O
? => o
? => OE
? => oe
? => R
? => r
? => R
? => r
? => R
? => r
? => S
? => s
? => S
? => s
? => S
? => s
? => S
? => s
? => T
? => t
? => T
? => t
? => T
? => t
? => U
? => u
ū => U
ū => u
? => U
? => u
? => U
? => u
? => U
? => u
? => U
? => u
? => W
? => w
? => Y
? => y
? => Y
? => Z
? => z
? => Z
? => z
? => Z
? => z
? => s
You can generate this convesion table yourself by simply iterarting over the $charsarray of the function:
您可以通过简单地迭代$chars函数数组来自己生成这个转换表:
foreach($chars as $k=>$v) {
printf("%s -> %s", $k, $v);
}
回答by Gino
This is a piece of code I found and use often:
这是我发现并经常使用的一段代码:
function stripAccents($stripAccents){
return strtr($stripAccents,'àáa???èéê?ìí???òó???ùú?üy?àá????èéê?ìí???òó???ùú?üY','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}
回答by Cazuma Nii Cavalcanti
UTF-8 friendly version of the simple function posted above by Gino:
Gino 上面发布的简单函数的 UTF-8 友好版本:
function stripAccents($str) {
return strtr(utf8_decode($str), utf8_decode('àáa???èéê?ìí???òó???ùú?üy?àá????èéê?ìí???òó???ùú?üY'), 'aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
}
Had to come to this because my php document was UTF-8 encoded.
不得不这样做,因为我的 php 文档是 UTF-8 编码的。
Hope it helps.
希望能帮助到你。
回答by gabo
if you have http://php.net/manual/en/book.intl.phpavailable, this solved your problem
如果你有http://php.net/manual/en/book.intl.php可用,这解决了你的问题
$string = "Fó? B?r";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
回答by langpavel
When using iconv, the parameter locale must be set:
使用时iconv,必须设置参数 locale:
function test_enc($text = 'ě????yáíé ě????Yáíé fó? b?r Fó? B?R ?')
{
echo '<tt>';
echo iconv('utf8', 'ascii//TRANSLIT', $text);
echo '</tt><br/>';
}
test_enc();
setlocale(LC_ALL, 'cs_CZ.utf8');
test_enc();
setlocale(LC_ALL, 'en_US.utf8');
test_enc();
Yields into:
产生:
????????? ????????? f?? b?r F?? B?R ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae
escrzyaie ESCRZYAIE fo? bar FO? BAR ae
Another locales then cs_CZ and en_US I haven't installed and I can't test it.
另一个语言环境,然后是 cs_CZ 和 en_US 我还没有安装,我无法测试它。
In C# I see solution using translation to unicode normalized form - accents are splitted out and then filtered via nonspacing unicode category.
在 C# 中,我看到使用翻译到 unicode 规范化形式的解决方案 - 重音被拆分,然后通过非间距 unicode 类别过滤。
回答by Waiyl Karim
The easiest way is to use iconv()PHP native function.
最简单的方法是使用iconv()PHP 原生函数。
echo iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', "Th?s ?s à vêry wrong séntènce!");
// output: This is a very wrong sentence!
回答by Junior Mayhé
Indeed is a matter of taste. There are many flavors for converting such letters.
确实是品味问题。转换此类字母有多种方式。
function replaceAccents($str)
{
$a = array('à', 'á', '?', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', 'D', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'Y', '?', 'à', 'á', 'a', '?', '?', '?', '?', '?', 'è', 'é', 'ê', '?', 'ì', 'í', '?', '?', '?', 'ò', 'ó', '?', '?', '?', '?', 'ù', 'ú', '?', 'ü', 'y', '?', 'ā', 'ā', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ē', 'ē', '?', '?', '?', '?', '?', '?', 'ě', 'ě', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ī', 'ī', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ń', '?', '?', '?', 'ň', '?', 'ō', 'ō', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ū', 'ū', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'ǎ', 'ǎ', 'ǐ', 'ǐ', 'ǒ', 'ǒ', 'ǔ', 'ǔ', 'ǖ', 'ǖ', 'ǘ', 'ǘ', 'ǚ', 'ǚ', 'ǜ', 'ǜ', '?', '?', '?', '?', '?', '?');
$b = array('A', 'A', 'A', 'A', 'A', 'A', 'AE', 'C', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I', 'D', 'N', 'O', 'O', 'O', 'O', 'O', 'O', 'U', 'U', 'U', 'U', 'Y', 's', 'a', 'a', 'a', 'a', 'a', 'a', 'ae', 'c', 'e', 'e', 'e', 'e', 'i', 'i', 'i', 'i', 'n', 'o', 'o', 'o', 'o', 'o', 'o', 'u', 'u', 'u', 'u', 'y', 'y', 'A', 'a', 'A', 'a', 'A', 'a', 'C', 'c', 'C', 'c', 'C', 'c', 'C', 'c', 'D', 'd', 'D', 'd', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'E', 'e', 'G', 'g', 'G', 'g', 'G', 'g', 'G', 'g', 'H', 'h', 'H', 'h', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'I', 'i', 'IJ', 'ij', 'J', 'j', 'K', 'k', 'L', 'l', 'L', 'l', 'L', 'l', 'L', 'l', 'l', 'l', 'N', 'n', 'N', 'n', 'N', 'n', 'n', 'O', 'o', 'O', 'o', 'O', 'o', 'OE', 'oe', 'R', 'r', 'R', 'r', 'R', 'r', 'S', 's', 'S', 's', 'S', 's', 'S', 's', 'T', 't', 'T', 't', 'T', 't', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'W', 'w', 'Y', 'y', 'Y', 'Z', 'z', 'Z', 'z', 'Z', 'z', 's', 'f', 'O', 'o', 'U', 'u', 'A', 'a', 'I', 'i', 'O', 'o', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'U', 'u', 'A', 'a', 'AE', 'ae', 'O', 'o');
return str_replace($a, $b, $str);
}
回答by Xetius
You could use urlencode. Does not quite do what you want (remove accents), but will give you a url usable string
你可以使用 urlencode。不完全按照您的要求(删除重音符号),但会给您一个 url 可用字符串
$output = urlencode ($input);
In Perl I could use a translate regex, but I cannot think of the PHP equivalent
在 Perl 中,我可以使用翻译正则表达式,但我想不出 PHP 等效项
$input =~ tr/áaà?/aaaa/;
etc...
等等...
you could do this using preg_replace
你可以使用 preg_replace 做到这一点
$patterns[0] = '/[á|a|à|?|?]/';
$patterns[1] = '/[e|é|ê|è|?]/';
$patterns[2] = '/[í|?|ì|?]/';
$patterns[3] = '/[ó|?|ò|?|?|?]/';
$patterns[4] = '/[ú|?|ù|ü]/';
$patterns[5] = '/?/';
$patterns[6] = '/?/';
$patterns[7] = '/?/';
$replacements[0] = 'a';
$replacements[1] = 'e';
$replacements[2] = 'i';
$replacements[3] = 'o';
$replacements[4] = 'u';
$replacements[5] = 'ae';
$replacements[6] = 'c';
$replacements[7] = 'ss';
$output = preg_replace($patterns, $replacements, $input);
(Please note this was typed from a foggy beer ridden Friday after noon memory, so may not be 100% correct)
(请注意,这是在星期五中午记忆后骑着雾蒙蒙的啤酒打出来的,所以可能不是 100% 正确)
or you could make a hash table and do a replacement based off of that.
或者您可以制作一个哈希表并根据它进行替换。
回答by Mimouni
here is a simple function that i use usually to remove accents :
这是我通常用来删除重音的简单函数:
function str_without_accents($str, $charset='utf-8')
{
$str = htmlentities($str, ENT_NOQUOTES, $charset);
$str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '', $str);
$str = preg_replace('#&([A-za-z]{2})(?:lig);#', '', $str); // pour les ligatures e.g. 'œ'
$str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères
return $str; // or add this : mb_strtoupper($str); for uppercase :)
}

