php Preg_Replace 和 UTF8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2063192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:52:29  来源:igfitidea点击:

Preg_Replace and UTF8

phpregexutf-8

提问by Jan Han?i?

I'm enhancing our video search pageto highlight the search term(s) in the results. Because user can enter judas priestand a video has Judas Priestin it's text I have to use regular expressions to preserve the case of the original text.

我正在增强我们的视频搜索页面以突出显示结果中的搜索词。因为用户可以输入judas priest并且视频中有Judas Priest文本,所以我必须使用正则表达式来保留原始文本的大小写。

My code works, but I have problems with special characters like ?, ? and ?, it seems that Preg_Replace()will only match if the case is the same (despite the /uimodifier). My code:

我的代码有效,但是我遇到了特殊字符的问题,例如?, ? and ?,似乎Preg_Replace()只有在大小写相同的情况下才会匹配(尽管有/ui修饰符)。我的代码:

$Content = Preg_Replace ( '/\b(' . $term . '?)\b/iu', '<span class="HighlightTerm"></span>', $Content );

I also tried this:

我也试过这个:

$Content = Mb_Eregi_Replace ( '\b(' . $term . '?)\b', '<span class="HighlightTerm">\1</span>', $Content );

But it also doesn't work. It will match "SRE?A" if the search term is "SRE?A", but if the search term is "sre?a" it will not match it (and vice versa).

但它也不起作用。如果搜索词是“SRE?A”,它将匹配“SRE?A”,但如果搜索词是“sre?a”,它将不匹配(反之亦然)。

So how do I make this work?

那么我该如何进行这项工作呢?

update:I set the locale and internal encoding:

更新:我设置了语言环境和内部编码:

Mb_Internal_Encoding ( 'UTF-8' );
$loc = "UTF-8";
putenv("LANG=$loc");
$loc = setlocale(LC_ALL, $loc);

回答by Jan Han?i?

I feel really stupid right about now but the problem wasn't with Preg_* functions at all. I don't know why but I first checked if the given term is even in the string with StriPosand since that function is not multi-byte safe it returned falseif the case of the text was not the same as the search term, so the Preg_Replacewasn't even called.

我现在觉得很愚蠢,但问题根本不在于 Preg_* 函数。我不知道为什么,但我首先检查了给定的术语是否在字符串中,StriPos并且由于该函数不是多字节安全的,false如果文本的大小写与搜索词的大小写不同,则返回,所以Preg_Replace不是甚至没有打电话。

So the lesson to be learned here is that always use multi-byte versions of functions if you have UTF8 strings.

所以这里要吸取的教训是,如果您有 UTF8 字符串,请始终使用多字节版本的函数。

回答by gnarf

Not sure what your problem is stemming from, but I just put together this little test case:

不确定你的问题是什么原因,但我只是把这个小测试案例放在一起:

<?php

$uc = "SRE?A";

mb_internal_encoding('utf-8');
echo $uc."\n";
$lc = mb_strtolower($uc);
echo $lc."\n";

echo preg_replace("/\b(".preg_quote($uc).")\b/ui", "<span class='test'></span>", "test:".$lc." end test");

It's output on my machine:

它在我的机器上输出:

SRE?A
sre?a
test:<span class='test'>sre?a</span> end test

Seems to be working properly?

似乎工作正常?

回答by troelskn

If I'm not mistaken, preg_matchuses the current locale. Try setting the localeto the language which these characters belongs to. You probably need a utf8 based locale too. If you have mixed languages in your page, you may be able to find a generic international locale that works.

如果我没记错的话,preg_match使用当前的语言环境。尝试将语言环境设置为这些字符所属的语言。您可能也需要基于 utf8 的语言环境。如果您的页面中有混合语言,您或许可以找到一个通用的国际语言环境。

See also: http://www.phpwact.org/php/i18n/utf-8

另见:http: //www.phpwact.org/php/i18n/utf-8