php Preg_Replace 和 UTF8

Question

提问by Jan Han?i?

I'm enhancing our video search pageto highlight the search term(s) in the results. Because user can enter judas priestand a video has Judas Priestin it's text I have to use regular expressions to preserve the case of the original text.

我正在增强我们的视频搜索页面以突出显示结果中的搜索词。因为用户可以输入judas priest并且视频中有Judas Priest文本，所以我必须使用正则表达式来保留原始文本的大小写。

My code works, but I have problems with special characters like ?, ? and ?, it seems that Preg_Replace()will only match if the case is the same (despite the /uimodifier). My code:

我的代码有效，但是我遇到了特殊字符的问题，例如?, ? and ?，似乎Preg_Replace()只有在大小写相同的情况下才会匹配（尽管有/ui修饰符）。我的代码：

$Content = Preg_Replace ( '/\b(' . $term . '?)\b/iu', '<span class="HighlightTerm"></span>', $Content );

I also tried this:

我也试过这个：

$Content = Mb_Eregi_Replace ( '\b(' . $term . '?)\b', '<span class="HighlightTerm">\1</span>', $Content );

But it also doesn't work. It will match "SRE?A" if the search term is "SRE?A", but if the search term is "sre?a" it will not match it (and vice versa).

但它也不起作用。如果搜索词是“SRE?A”，它将匹配“SRE?A”，但如果搜索词是“sre?a”，它将不匹配（反之亦然）。

So how do I make this work?

那么我该如何进行这项工作呢？

update:I set the locale and internal encoding:

更新：我设置了语言环境和内部编码：

Mb_Internal_Encoding ( 'UTF-8' );
$loc = "UTF-8";
putenv("LANG=$loc");
$loc = setlocale(LC_ALL, $loc);

Answer 1

回答by Jan Han?i?

I feel really stupid right about now but the problem wasn't with Preg_* functions at all. I don't know why but I first checked if the given term is even in the string with StriPosand since that function is not multi-byte safe it returned falseif the case of the text was not the same as the search term, so the Preg_Replacewasn't even called.

我现在觉得很愚蠢，但问题根本不在于 Preg_* 函数。我不知道为什么，但我首先检查了给定的术语是否在字符串中，StriPos并且由于该函数不是多字节安全的，false如果文本的大小写与搜索词的大小写不同，则返回，所以Preg_Replace不是甚至没有打电话。

So the lesson to be learned here is that always use multi-byte versions of functions if you have UTF8 strings.

所以这里要吸取的教训是，如果您有 UTF8 字符串，请始终使用多字节版本的函数。

Answer 2

回答by gnarf

Not sure what your problem is stemming from, but I just put together this little test case:

不确定你的问题是什么原因，但我只是把这个小测试案例放在一起：

<?php

$uc = "SRE?A";

mb_internal_encoding('utf-8');
echo $uc."\n";
$lc = mb_strtolower($uc);
echo $lc."\n";

echo preg_replace("/\b(".preg_quote($uc).")\b/ui", "<span class='test'></span>", "test:".$lc." end test");

It's output on my machine:

它在我的机器上输出：

SRE?A
sre?a
test:<span class='test'>sre?a</span> end test

Seems to be working properly?

似乎工作正常？

Answer 3

回答by troelskn

If I'm not mistaken, preg_matchuses the current locale. Try setting the localeto the language which these characters belongs to. You probably need a utf8 based locale too. If you have mixed languages in your page, you may be able to find a generic international locale that works.

如果我没记错的话，preg_match使用当前的语言环境。尝试将语言环境设置为这些字符所属的语言。您可能也需要基于 utf8 的语言环境。如果您的页面中有混合语言，您或许可以找到一个通用的国际语言环境。

See also: http://www.phpwact.org/php/i18n/utf-8

另见：http: //www.phpwact.org/php/i18n/utf-8

php Preg_Replace 和 UTF8

提问by Jan Han?i?

回答by Jan Han?i?

回答by gnarf

回答by troelskn

相关推荐

最近更新

标签

php Preg_Replace 和 UTF8

提问by Jan Han?i?

回答by Jan Han?i?

回答by gnarf

回答by troelskn

相关推荐

php 检查 URL 是否有效的最佳方法

如何在 PHP 中使用 str_replace() 替换“\”？

php 如何在php中使用管理员和用户制作登录表单

php “不能使用字符串偏移量作为数组”错误

相关推荐

最近更新

标签