如何在 php 中使用 preg_split()?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24189698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 17:12:15  来源:igfitidea点击:

how to use preg_split() in php?

phppreg-split

提问by MD.MD

Can anybody explain to me how to use preg_split() function? I didn't understand the pattern parameter like this "/[\s,]+/".

有人可以向我解释如何使用 preg_split() 函数吗?我不明白这样的模式参数"/[\s,]+/"

for example:

例如:

I have this subject: is is.and I want the results to be:

我有这个主题:is is.我希望结果是:

array (
  0 => 'is',
  1 => 'is',
)

so it will ignore the space and the full-stop, how I can do that?

所以它会忽略空格和句号,我该怎么做?

回答by Majenko

pregmeans Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".

preg表示Pcre REGexp”,这有点多余,因为“PCRE”表示“Perl Compatible Regexp”。

Regexps are a nightmare to the beginner. I still don't fully understand them and I've been working with them for years.

正则表达式对初学者来说是一场噩梦。我仍然不完全理解他们,我已经和他们一起工作了很多年。

Basically the example you have there, broken down is:

基本上你在那里的例子,分解是:

"/[\s,]+/"

/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character

So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".

因此,您有一个搜索模式,它“在字符串的任何部分拆分,该部分至少是一个空格字符和/或一个或多个逗号”。

Other common characters are:

其他常见的字符有:

. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character

For PHP there is good information in the official documentation.

对于 PHP ,官方文档中有很好的信息。

回答by JakeGould

This should work:

这应该有效:

$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);

echo '<pre>';
print_r($words);
echo '</pre>';

The output would be:

输出将是:

Array
(
    [0] => is
    [1] => is
)

Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY. That basically means only return the results of preg_splitif the results are not empty. This assures you the data returned in the array $wordstruly has data in it and not just empty values which can happen when dealing with regex patterns and mixed data sources.

在我解释正则表达式之前,只是对PREG_SPLIT_NO_EMPTY. 这基本上意味着只有preg_split在结果不为空时才返回结果。这确保您在数组中返回的数据中$words确实有数据,而不仅仅是在处理正则表达式模式和混合数据源时可能发生的空值。

And the explanation of that regex can be broken down like this using this tool:

使用此工具可以像这样分解该正则表达式的解释:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [!?.]*                   any character of: '!', '?', '.' (0 or more
                           times (matching the most amount possible))

An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/in this other other tool:

通过/(?<=\w)\b\s*[!?.]*/其他工具中输入完整的正则表达式模式可以找到更好的解释:

  • (?<=\w)Positive Lookbehind - Assert that the regex below can be matched
  • \wmatch any word character [a-zA-Z0-9_]
  • \bassert position at a word boundary (^\w|\w$|\W\w|\w\W)
  • \s*match any white space character[\r\n\t\f ]
  • Quantifier:Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • !?.a single character in the list !?.literally
  • (?<=\w)Positive Lookbehind - 断言下面的正则表达式可以匹配
  • \w匹配任何单词字符 [a-zA-Z0-9_]
  • \b在单词边界处断言位置 (^\w|\w$|\W\w|\w\W)
  • \s*匹配任何空白字符[\r\n\t\f ]
  • 量词:在零次和无限次之间,尽可能多次,按需回馈[贪婪]
  • !?.列表中的一个字符!?.字面

That last regex explanation can be boiled down by a human—also known as me—as the following:

最后一个正则表达式解释可以由一个人(也称为我)归结为以下内容:

Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?..

匹配并拆分出现在单词边界之前的任何单词字符,该单词边界可以有多个空格和!?..

回答by Federico Piazza

Documentation says:

文档说:

The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.

preg_split() 函数的操作与 split() 完全一样,只是正则表达式被接受为模式的输入参数。

So, the following code...

所以,下面的代码...

<?php

$ip = "123 ,456 ,789 ,000"; 
$iparr = preg_split ("/[\s,]+/", $ip); 
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />"  ;
print "$iparr[3] <br />"  ;

?>

This will produce following result.

这将产生以下结果。

123
456
789
000 

So, if have this subject: is isand you want: array ( 0 => 'is', 1 => 'is', )

所以,如果有这个主题:is is并且你想要: array ( 0 => 'is', 1 => 'is', )

you need to modify your regex to "/[\s]+/"

你需要修改你的正则表达式 "/[\s]+/"

Unless you have is ,isyou need the regex you already have "/[\s,]+/"

除非你is ,is需要你已经拥有的正则表达式"/[\s,]+/"

回答by ceejayoz

PHP's str_word_countmay be a better choice here.

PHPstr_word_count在这里可能是更好的选择。

str_word_count($string, 2)will output an array of all words in the string, including duplicates.

str_word_count($string, 2)将输出字符串中所有单词的数组,包括重复项。