php 正则表达式匹配任何空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21974376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:35:39  来源:igfitidea点击:

regex match any whitespace

phpregex

提问by user3344311

I want to make a replacement using regex and preg_replace function. this is my code

我想使用正则表达式和 preg_replace 函数进行替换。这是我的代码

$verif = "/wordA(\s*)wordB(?! wordc)/i";
$replacement = 'wordA wordb wordc';
$newvar = preg_replace($verif, $replacement, $article->text);

That works if only we have one whitespace between wordA and wordB. I need to match what ever the number of whitespaces between wordA & wordB.

如果我们只有在 wordA 和 wordB 之间有一个空格,那才有效。我需要匹配 wordA 和 wordB 之间的空格数。

example:

例子:

wordA (10 or more whitespace) wordB -> wordA wordb wordc same wordA(1 whitespace) wordB -> wordA wordb wordc ...

wordA(10 个或更多空格) wordB -> wordA wordb wordc 相同 wordA(1 个空格) wordB -> wordA wordb wordc ...

回答by acarlon

Your regex should work 'as-is'. Assuming that it is doing what you want it to.

您的正则表达式应该“按原样”工作。假设它正在做你想做的事。

wordA(\s*)wordB(?! wordc)

This means match wordAfollowed by 0 or more spaces followed by wordB, but do not matchif followed by wordc. Note the single space between ?!and wordcwhich means that wordA wordB wordcwill not match, but wordA wordB wordcwill.

这意味着匹配wordA后跟 0 个或多个空格后跟wordB,但不匹配后跟wordc。请注意?!和之间的单个空格,wordc这意味着wordA wordB wordc不会匹配,但wordA wordB wordc会。

Here are some example matches and the associated replacement output:

以下是一些示例匹配和相关的替换输出:

enter image description here

在此处输入图片说明

Note that all matches are replaced no matter how many spaces. There are a couple of other points: -

请注意,无论有多少空格,都会替换所有匹配项。还有其他几点: -

  • (?! wordc)is a negative lookahead, so you wont match lines wordA wordB wordcwhich is assume is intended (and is why the last line is not matched). Currently you are relying on the space after ?!to match the whitespace. You may want to be more precise and use (?!\swordc). If you want to match against more than one space before wordc you can use (?!\s*wordc)for 0 or more spaces or (?!\s*+wordc)for 1 or more spaces depending on what your intention is. Of course, if you do want to match lines with wordc after wordB then you shouldn't use a negative lookahead.

  • *will match 0 or more spaces so it will match wordAwordB. You may want to consider +if you want at least one space.

  • (\s*)- the brackets indicate a capturing group. Are you capturing the whitespace to a group for a reason? If not you could just remove the brackets, i.e. just use \s.

  • (?! wordc)是一个负面的前瞻,所以你不会匹配wordA wordB wordc假设的行(这就是为什么最后一行不匹配)。目前,您依靠空格?!来匹配空格。您可能想要更精确并使用(?!\swordc). 如果您想在 wordc 之前匹配多个空格,您可以根据您的意图使用(?!\s*wordc)0 个或多个空格或(?!\s*+wordc)1 个或多个空格。 当然,如果您确实想在 wordB 之后匹配带有 wordc 的行,那么您不应该使用负前瞻。

  • *将匹配 0 个或多个空格,因此它将匹配 wordAwordB。您可能需要考虑+是否至少需要一个空间。

  • (\s*)- 括号表示捕获组。您是否出于某种原因将空格捕获到组中?如果没有,您可以删除括号,即只需使用\s.

Update based on comment

根据评论更新

Hello the problem is not the expression but the HTML out put   that are not considered as whitespace. it's a Joomla website.

您好,问题不是表达式,而是不被视为空格的 HTML 输出。这是一个 Joomla 网站。

Preserving your original regex you can use:

保留您可以使用的原始正则表达式:

wordA((?:\s| )*)wordB(?!(?:\s| )wordc)

The only difference is that not the regex matches whitespace OR  . I replaced wordcwith \swordcsince that is more explicit. Note as I have already pointed out that the negative lookahead ?!will notmatch when wordB is followed by a single whitespace and wordc. If you want to match multiple whitespaces then see my comments above. I also preserved the capture group around the whitespace, if you don't want this then remove the brackets as already described above.

唯一的区别是正则表达式不匹配空格 OR  。我替换wordc\swordc因为那更明确。请注意,我已经指出,当 wordB 后跟单个空格和 wordc 时,负向前瞻?!匹配。如果您想匹配多个空格,请参阅我上面的评论。我还在空格周围保留了捕获组,如果您不想要这个,则如上所述删除括号。

Example matches:

示例匹配:

enter image description here

在此处输入图片说明

回答by mrres1

The reason I used a +instead of a '*' is because a plus is defined as one or more of the preceding element, where an asterisk is zero or more. In this case we want a delimiter that's a little more concrete, so "one or more" spaces.

我使用 a+而不是 '*' 的原因是因为加号被定义为一个或多个前面的元素,其中星号是零或多个。在这种情况下,我们需要一个更具体的分隔符,所以“一个或多个”空格。

word[Aa]\s+word[Bb]\s+word[Cc]

will match:

将匹配:

wordA wordB     wordC
worda wordb wordc
wordA   wordb   wordC

The words, in this expression, will have to be specific, and also in order (a, b, then c)

在这个表达式中,单词必须是具体的,并且按顺序 (a, b, then c)