如何在 php 中使用 preg_split()?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24189698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to use preg_split() in php?
提问by MD.MD
Can anybody explain to me how to use preg_split() function?
I didn't understand the pattern parameter like this "/[\s,]+/"
.
有人可以向我解释如何使用 preg_split() 函数吗?我不明白这样的模式参数"/[\s,]+/"
。
for example:
例如:
I have this subject: is is.
and I want the results to be:
我有这个主题:is is.
我希望结果是:
array (
0 => 'is',
1 => 'is',
)
so it will ignore the space and the full-stop, how I can do that?
所以它会忽略空格和句号,我该怎么做?
回答by Majenko
preg
means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".
preg
表示Pcre REGexp”,这有点多余,因为“PCRE”表示“Perl Compatible Regexp”。
Regexps are a nightmare to the beginner. I still don't fully understand them and I've been working with them for years.
正则表达式对初学者来说是一场噩梦。我仍然不完全理解他们,我已经和他们一起工作了很多年。
Basically the example you have there, broken down is:
基本上你在那里的例子,分解是:
"/[\s,]+/"
/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character
So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".
因此,您有一个搜索模式,它“在字符串的任何部分拆分,该部分至少是一个空格字符和/或一个或多个逗号”。
Other common characters are:
其他常见的字符有:
. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character
For PHP there is good information in the official documentation.
对于 PHP ,官方文档中有很好的信息。
回答by JakeGould
This should work:
这应该有效:
$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>';
print_r($words);
echo '</pre>';
The output would be:
输出将是:
Array
(
[0] => is
[1] => is
)
Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY
. That basically means only return the results of preg_split
if the results are not empty. This assures you the data returned in the array $words
truly has data in it and not just empty values which can happen when dealing with regex patterns and mixed data sources.
在我解释正则表达式之前,只是对PREG_SPLIT_NO_EMPTY
. 这基本上意味着只有preg_split
在结果不为空时才返回结果。这确保您在数组中返回的数据中$words
确实有数据,而不仅仅是在处理正则表达式模式和混合数据源时可能发生的空值。
And the explanation of that regex can be broken down like this using this tool:
使用此工具可以像这样分解该正则表达式的解释:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[!?.]* any character of: '!', '?', '.' (0 or more
times (matching the most amount possible))
An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/
in this other other tool:
通过/(?<=\w)\b\s*[!?.]*/
在其他工具中输入完整的正则表达式模式可以找到更好的解释:
(?<=\w)
Positive Lookbehind - Assert that the regex below can be matched\w
match any word character[a-zA-Z0-9_]
\b
assert position at a word boundary(^\w|\w$|\W\w|\w\W)
\s*
match any white space character[\r\n\t\f ]
- Quantifier:Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
!?.
a single character in the list!?.
literally
(?<=\w)
Positive Lookbehind - 断言下面的正则表达式可以匹配\w
匹配任何单词字符[a-zA-Z0-9_]
\b
在单词边界处断言位置(^\w|\w$|\W\w|\w\W)
\s*
匹配任何空白字符[\r\n\t\f ]
- 量词:在零次和无限次之间,尽可能多次,按需回馈[贪婪]
!?.
列表中的一个字符!?.
字面
That last regex explanation can be boiled down by a human—also known as me—as the following:
最后一个正则表达式解释可以由一个人(也称为我)归结为以下内容:
Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?.
.
匹配并拆分出现在单词边界之前的任何单词字符,该单词边界可以有多个空格和!?.
.
回答by Federico Piazza
Documentation says:
文档说:
The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.
preg_split() 函数的操作与 split() 完全一样,只是正则表达式被接受为模式的输入参数。
So, the following code...
所以,下面的代码...
<?php
$ip = "123 ,456 ,789 ,000";
$iparr = preg_split ("/[\s,]+/", $ip);
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />" ;
print "$iparr[3] <br />" ;
?>
This will produce following result.
这将产生以下结果。
123
456
789
000
So, if have this subject: is is
and you want:
array (
0 => 'is',
1 => 'is',
)
所以,如果有这个主题:is is
并且你想要: array ( 0 => 'is', 1 => 'is', )
you need to modify your regex to "/[\s]+/"
你需要修改你的正则表达式 "/[\s]+/"
Unless you have is ,is
you need the regex you already have "/[\s,]+/"
除非你is ,is
需要你已经拥有的正则表达式"/[\s,]+/"
回答by ceejayoz
PHP's str_word_count
may be a better choice here.
PHPstr_word_count
在这里可能是更好的选择。
str_word_count($string, 2)
will output an array of all words in the string, including duplicates.
str_word_count($string, 2)
将输出字符串中所有单词的数组,包括重复项。