PHP 正则表达式中的 UTF-8

Question

提问by Gasper

I need help with regular expressions. My string contains unicode characters and code below doesn't work.

我需要正则表达式方面的帮助。我的字符串包含 unicode 字符，下面的代码不起作用。

First four characters must be numbers, then comma and then any alphabetic characters or whitespaces... I already read that if i add /u on end of regular expresion but it didn't work for me...

前四个字符必须是数字，然后是逗号，然后是任何字母字符或空格......我已经读过，如果我在正则表达式的末尾添加 /u 但它对我不起作用......

My code works with non-unicode characters

我的代码适用于非 unicode 字符

$post = '9999,?kofja loka';;
echo preg_match('/^[0-9]{4},[\s]*[a-zA-Z]+', $post);

Thanks for your answers!

感谢您的回答！

Answer 1

回答by stema

Updated answer:
This is now tested and working

更新的答案：
现在已经过测试并且可以正常工作

$post = '9999, ?kofja loka';
echo preg_match('/^\d{4},[\s\p{L}]+$/u', $post);

\\wwill not work, because it does not contain all unicode letters and contains also [0-9_]additionally to the letters.

\\w将不起作用，因为它不包含所有 unicode 字母并且还包含[0-9_]字母。

Important is also the umodifier to activate the unicode mode.

重要的还有u激活 unicode 模式的修饰符。

If there can be letters orwhitespace after the comma then you should put those into the same character class, in your regex there are 0 or more whitespace after the comma and then there are only letters.

如果逗号后可以有字母或空格，那么您应该将它们放入相同的字符类中，在您的正则表达式中，逗号后有 0 个或多个空格，然后只有字母。

See http://www.regular-expressions.info/php.htmlfor php regex details

有关php 正则表达式的详细信息，请参阅http://www.regular-expressions.info/php.html

The \\p{L}(Unicode letter) is explained here

该\\p{L}（Unicode的字母），说明在这里

Important is also the use of the end of string boundary $to ensure that really the complete string is verified, otherwise it will match only the first whitespace and ignore the rest for example.

重要的是使用字符串边界的结尾$来确保真正验证完整的字符串，否则它将只匹配第一个空格并忽略其余的例如。

Answer 2

回答by jmz

[a-zA-Z]will match only letters in the range of a-z and A-Z. You have non-US-ASCII letters, and therefore your regex won't match, regardless of the /umodifier. You need to use the word character escape sequence (\w).

[a-zA-Z]将仅匹配 az 和 AZ 范围内的字母。您有非 US-ASCII 字母，因此无论/u修饰符如何，您的正则表达式都不会匹配。您需要使用单词字符转义序列 ( \w)。

$post = '9999,?kofja loka';
echo preg_match('/^[0-9]{4},[\s]*[\w]+/u', $post);

Answer 3

回答by Sodved

The problem is your regular expression. You are explicitly saying that you will only accept a b c ... z A B C ... Z. ?is not in the a-z set. Remember, ?is as different to sas any other character.

问题是你的正则表达式。您明确表示您只会接受a b c ... z A B C ... Z. ?不在 az 集中。请记住，?是因为不同于s其他任何字符。

So if you really just want a sequence of letters, then you need to test for the unicode properties. e.g.

所以如果你真的只想要一个字母序列，那么你需要测试 unicode 属性。例如

echo preg_match('/^[0-9]{4},[\s]*\p{L}+', $post);

That shouuld work because \p{L}matches any unicode character which is considered a letter. Not just A through Z.

这应该有效，因为\p{L}匹配任何被视为字母的 unicode 字符。不仅仅是 A 到 Z。

Answer 4

回答by searlea

Add a u, and remember the trailing slash:

添加u, 并记住尾部斜杠：

echo preg_match('/^[0-9]{4},[\s]*[a-zA-Z]+/u', $post);

Edited:

编辑：

echo preg_match('/^\d{4},(?:\s|\w)+/u', $post);

PHP 正则表达式中的 UTF-8

提问by Gasper

回答by stema

回答by jmz

回答by Sodved

回答by searlea

相关推荐

最近更新

标签

PHP 正则表达式中的 UTF-8

提问by Gasper

回答by stema

回答by jmz

回答by Sodved

回答by searlea

相关推荐

使用 Ajax 使用 php 将记录插入到 mysql 数据库

在 PHP DOM 中获取节点的文本

php 如何安全地实现“基于令牌的身份验证”以访问在 PHPFox 中开发的网站资源（即功能和数据）？

php 致命错误：“中断”不在“循环”或“切换”上下文中

相关推荐

最近更新

标签