用于删除重音的 php iconv translit:无法正常工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4910627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 16:14:29  来源:igfitidea点击:

php iconv translit for removing accents: not working as excepted?

phpstringunicodeutf-8unicode-normalization

提问by dynamic

consider this simple code:

考虑这个简单的代码:

echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

it prints

它打印

 `e

instead of just

而不仅仅是

 e

do you know what I am doing wrong?

你知道我做错了什么吗?



nothing changed after adding setlocale

添加 setlocale 后没有任何变化

setlocale(LC_COLLATE, 'en_US.utf8');
echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');

回答by Hidde

I have this standard function to return valid url strings without the invalid url characters. The magic seems to be in the line after the //remove unwanted characterscomment.

我有这个标准函数来返回没有无效 url 字符的有效 url 字符串。魔术似乎在//remove不需要的字符注释之后的行中。

This is taken from the Symfony framework documentation: http://www.symfony-project.org/jobeet/1_4/Doctrine/en/08which in turn is taken from http://php.vrana.cz/vytvoreni-pratelskeho-url.phpbut i don't speak Czech ;-)

这取自 Symfony 框架文档:http: //www.symfony-project.org/jobeet/1_4/Doctrine/en/08,这又取自http://php.vrana.cz/vytvoreni-pratelskeho- url.php但我不会说捷克语 ;-)

function slugify($text)
{
  // replace non letter or digits by -
  $text = preg_replace('#[^\pL\d]+#u', '-', $text);

  // trim
  $text = trim($text, '-');

  // transliterate
  if (function_exists('iconv'))
  {
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
  }

  // lowercase
  $text = strtolower($text);

  // remove unwanted characters
  $text = preg_replace('#[^-\w]+#', '', $text);

  if (empty($text))
  {
    return 'n-a';
  }

  return $text;
}

echo slugify('é'); // --> "e"

回答by eleg

cf @tchrist, with INTL php extension

cf @tchrist,带有 INTL php 扩展名

http://fr2.php.net/manual/en/book.intl.php

http://fr2.php.net/manual/en/book.intl.php

preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));

eéèê?i??o??uù?üaa?? ? ??? ?????ū???????? ?ǖbecomes

eéèê?i??o??uù?üaa?? ? ????????ū???????? ?ǖ变成

eeeeeiiiooouuuuaaaA Η OaA A???uU?O?????? ?u

eeeeeiiiooouuuuaaaA Η OaA A???uU?O?????? ?u



As tchrist emphasises, not all unicode characters are considered decomposable:

正如 tchrist 所强调的,并非所有 unicode 字符都被认为是可分解的:

extract from Unicode charts:

从 Unicode 图表中提取:

U0080.pdf

U0080.pdf

00CF ? LATIN CAPITAL LETTER I WITH DIAERESIS

≡ 0049 I 0308 ¨

NB this symbol ? ≡ ? indicate an available decomposition

00D0 D LATIN CAPITAL LETTER ETH

→ 00F0 e latin small letter eth

→ 0110 ? latin capital letter d with stroke

→ 0189 ? latin capital letter african d

00CF ? 带分音符的拉丁文大写字母 I

≡ 0049 I 0308 ¨

注意这个符号?≡ ? 表明一个可用的分解

00D0 D 拉丁文大写字母 ETH

→ 00F0 e 拉丁小写字母 eth

→0110?带笔画的拉丁文大写字母 d

→0189?拉丁文大写字母非洲 d

no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).

没有可用的分解,恕我直言很奇怪(我们可以将 ASCII 字母 D 视为可接受的等价物)。

U0100.pdf

U0100.pdf

0110 ? LATIN CAPITAL LETTER D WITH STROKE

→ 00D0 D latin capital letter eth

→ 0111 ? latin small letter d with stroke

→ 0189 ? latin capital letter african d

0110?带笔画的拉丁文大写字母 D

→ 00D0 D 拉丁大写字母 eth

→0111?带笔划的拉丁文小写字母 d

→0189?拉丁文大写字母非洲 d

even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]

更奇怪的是:这个被识别为拉丁大写字母 D(带笔画),但不能分解!也许更酷的解决方案应该是获取每个字符的 unicode 描述,并将其与每个 ascii 字符的描述进行比较(并相应地替换)。任何人?;-]

cf http://unicode.org/Public/UNIDATA/UnicodeData.txt

参见http://unicode.org/Public/UNIDATA/UnicodeData.txt

回答by Stone

It happen with me with pure iconv without php. The Trick was to set LANG environment value to en_US.UTF-8 (it was hu_HU.UTF-8 before, in my case). After it worked as expected.

它发生在我身上,没有 php 的纯 iconv。诀窍是将 LANG 环境值设置为 en_US.UTF-8(在我的情况下之前是 hu_HU.UTF-8)。在它按预期工作之后。

回答by Michael Parkin

When doing transliteration, you have to make sure that your LC_COLLATE is properly set, otherwise the default POSIX will be used.

在进行音译时,您必须确保您的 LC_COLLATE 设置正确,否则将使用默认的 POSIX。

Look at http://uk3.php.net/manual/en/function.setlocale.php

http://uk3.php.net/manual/en/function.setlocale.php

回答by Mike Sherrill 'Cat Recall'

I'm tempted to say "nothing", although this is a little outside my expertise. PHP's iconv() is notorious, and the inspiration for many workarounds, including

我很想说“没什么”,尽管这有点超出我的专业知识。PHP 的 iconv() 是臭名昭著的,许多解决方法的灵感,包括

  • dropping to the system's iconv utility (Unix & Linux)
  • crafting a lookup table
  • replacing all accented characters with an ASCII equivalent as kind of a preprocessing stage
  • setting LC_COLLATE (which doesn't seem to work for everyone)
  • use htmlentities() instead of iconv()
  • 进入系统的 iconv 实用程序(Unix 和 Linux)
  • 制作查找表
  • 用相当于预处理阶段的 ASCII 替换所有重音字符
  • 设置 LC_COLLATE (这似乎并不适合所有人)
  • 使用 htmlentities() 而不是 iconv()

Read the comments for iconv() documentationfor more inspiration. (Or commiseration. Too close to call.)

阅读iconv() 文档的评论以获得更多灵感。(或同情。太接近了。)

回答by Xeoncross

It seems the standard way to handle this is with a "removing accents" function which you can find in library's like flourishor CMS's like Wordpress. Iconv seems to be unable to translate accents (and rightly so) since this isn't a good idea for anything other than URL slugs.

似乎处理这个问题的标准方法是使用“删除重音”功能,您可以在库中找到,例如繁荣或 CMS 之类的Wordpress。Iconv 似乎无法翻译重音(而且确实如此),因为这对于 URL slug 以外的任何内容都不是一个好主意。

回答by fred727

It seem that it depend of the php version...

似乎它取决于php版本......



TestCase #1

测试用例 #1

php -version

PHP 7.0.0RC8(cli) (built: Nov 25 2015 12:36:50) ( NTS ) Copyright (c) 1997-2015 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2015 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2015, by Zend Technologies

PHP 7.0.0RC8(cli) (built: Nov 25 2015 12:36:50) ( NTS ) 版权所有 (c) 1997-2015 The PHP Group Zend Engine v3.0.0, 版权所有 (c) 1998-2015 Zend Technologies with Zend OPcache v7.0.6-dev,版权所有 (c) 1999-2015,由 Zend Technologies

php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));"

string(2) "`e"


TestCase #2

测试用例#2

php -version

PHP 7.0.8-1~dotdeb+8.1(cli) ( NTS ) Copyright (c) 1997-2016 The PHP Group Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.8-1~dotdeb+8.1, Copyright (c) 1999-2016, by Zend Technologies

PHP 7.0.8-1~dotdeb+8.1(cli) (NTS) 版权所有 (c) 1997-2016 The PHP Group Zend Engine v3.0.0,版权所有 (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.8-1~ dotdeb+8.1,版权所有 (c) 1999-2016,由 Zend Technologies

php -r "var_dump(iconv('UTF-8', 'ASCII//TRANSLIT', 'è'));"

string(1) "e"