php 如何删除重音符号并将字母转换为“普通”ASCII 字符？

Question

提问by Mark Lalor

What is the most efficient way to remove accents from a string e.g. èau?becomes Eaun?

从字符串中删除重音的最有效方法是什么，例如èau?变成Eaun？

Is there a simple, built in way that I'm missing or a regular expression?

是否有我缺少的简单内置方式或正则表达式？

Answer 1

回答by Piskvor left the building

If you have iconv installed, try this (the example assumes your input string is in UTF-8):

如果你安装了 iconv，试试这个（这个例子假设你的输入字符串是 UTF-8）：

echo iconv('UTF-8', 'ASCII//TRANSLIT', $string);

(iconv is a library to convert between all kinds of encodings; it's efficient and included with many PHP distributions by default. Most of all, it's definitely easier and more error-proof than trying to roll your own solution (did you know that there's a "Latin letter N with a curl"? Me neither.))

（ iconv 是一个可以在各种编码之间进行转换的库；它非常高效，并且默认包含在许多 PHP 发行版中。最重要的是，与尝试推出自己的解决方案相比，它绝对更容易且更防错（您知道吗？ “带卷曲的拉丁字母 N”？我都没有。））

Answer 2

回答by SimonSimCity

I found a solution, that worked in all my test-cases (copied from http://php.net/manual/en/transliterator.transliterate.php):

我找到了一个解决方案，它适用于我所有的测试用例（从http://php.net/manual/en/transliterator.transliterate.php复制）：

var_dump(transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove',
    "A ? übérmensch p? h?yeste niv?! И я люблю PHP! есть. ? |"));
// string(50) "A ae Ubermensch pa hoyeste niva! I a lublu PHP! est. fi "

see: http://www.php.net/normalizer

见：http: //www.php.net/normalizer

EDIT:This solution is independent of the locale set using setlocale(). Another benefit over iconv()is, that even non-latin characters are not ignored.

编辑：此解决方案独立于使用setlocale()设置的语言环境。与iconv() 相比的另一个好处是，即使是非拉丁字符也不会被忽略。

EDIT2:I discovered, that there are some characters, that are not covered by the transliteration I posted originally. Any-Latintranslates the cyrillic character ьto a character, that doesn't fit into a latin character-set: ?(http://en.wikipedia.org/wiki/Prime_%28symbol%29). I've added [\u0100-\u7fff] removeto remove all these non-latin characters. I also added a test to the text ;)

EDIT2：我发现有些字符没有被我最初发布的音译所涵盖。Any-Latin转换西里尔字符ь一个字符，不适合拉丁字符集：?（http://en.wikipedia.org/wiki/Prime_%28symbol%29）。我添加[\u0100-\u7fff] remove了删除所有这些非拉丁字符。我还在文本中添加了一个测试；)

I suggest, that they mean the latin alphabet and not one of the latin character-sets by Latinhere. But anyways - in my opinion, they should transliterate it to something ASCII then in Latin-ASCII...

我建议，他们的意思是拉丁字母，而不是Latin这里的拉丁字符集之一。但无论如何 - 在我看来，他们应该将它音译为 ASCII 的东西，然后在Latin-ASCII......

EDIT3:Sorry for another change here. I had to take the characters down to u0080 instead of u0100, to get only ASCII characters as output. The test above is updated.

EDIT3：对不起，这里有另一个变化。我不得不将字符降到 u0080 而不是 u0100，以仅获得 ASCII 字符作为输出。上面的测试已更新。

Answer 3

回答by neokio

Reposting this on request of @palantir ...

应@palantir 的要求重新发布此内容...

I find iconv completely unreliable, and I dislike preg_replace solutions and big arrays ... so my favorite way (and the only reliable method I've found) is ...

我发现 iconv 完全不可靠，而且我不喜欢 preg_replace 解决方案和大数组......所以我最喜欢的方法（也是我发现的唯一可靠的方法）是......

function toASCII( $str )
{
    return strtr(utf8_decode($str), 
        utf8_decode(
        '???????￥μàá??????èéê?ìí??D?òó????ùú?üY?àáa?????èéê?ìí??e?òó????ùú?üy?'),
        'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
}

Answer 4

回答by Gumbo

You can use iconvto transliterate the characters to plain US-ASCII and then use a regular expression to remove non-alphabetic characters:

您可以使用iconv将字符音译为纯 US-ASCII，然后使用正则表达式删除非字母字符：

preg_replace('/[^a-z]/i', '', iconv("UTF-8", "US-ASCII//TRANSLIT", $text))

Another way would be using the Normalizerto normalize to the Normalization Form KD (NFKD)and then remove the mark characters:

另一种方法是使用归一化器归一化为归一化形式 KD (NFKD)，然后删除标记字符：

preg_replace('/\p{Mn}/u', '', Normalizer::normalize($text, Normalizer::FORM_KD))

Answer 5

回答by Johnny Broadway

Note: I'm reposting this from another similar question in the hope that it's helpful to others.

注意：我是从另一个类似的问题中重新发布的，希望对其他人有所帮助。

I ended up writing a PHP library based on URLify.js from the Django project, since I found iconv() to be too incomplete. You can find it here:

我最终基于 Django 项目中的 URLify.js 编写了一个 PHP 库，因为我发现 iconv() 太不完整了。你可以在这里找到它：

https://github.com/jbroadway/urlify

Handles Latin characters as well as Greek, Turkish, Russian, Ukrainian, Czech, Polish, and Latvian.

处理拉丁字符以及希腊语、土耳其语、俄语、乌克兰语、捷克语、波兰语和拉脱维亚语。

php 如何删除重音符号并将字母转换为“普通”ASCII 字符？

提问by Mark Lalor

回答by Piskvor left the building

回答by SimonSimCity

回答by neokio

回答by Gumbo

回答by Johnny Broadway

相关推荐

最近更新

标签

php 如何删除重音符号并将字母转换为“普通”ASCII 字符？

提问by Mark Lalor

回答by Piskvor left the building

回答by SimonSimCity

回答by neokio

回答by Gumbo

回答by Johnny Broadway

相关推荐

php 用 PHPExcel 计算总和

php 输出以秒为单位。在php中转换为hh:mm:ss格式

php MS Access：如何将 NULL 插入 DateTime 字段

php php从表单更新sql

相关推荐

最近更新

标签