我们如何将 a^nb^n 与 Java 正则表达式匹配?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3644266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can we match a^n b^n with Java regex?
提问by polygenelubricants
This is the second part of a series of educational regex articles. It shows how lookaheads and nested references can be used to match the non-regular languge anbn. Nested references are first introduced in: How does this regex find triangular numbers?
这是一系列教育正则表达式文章的第二部分。它展示了如何使用前瞻和嵌套引用来匹配非常规语言 a nb n。嵌套引用首先在:这个正则表达式如何找到三角形数?
One of the archetypal non-regular languagesis:
原型非常规语言之一是:
L = { a
nb
n: n > 0 }
L = { a
nb
n: n > 0 }
This is the language of all non-empty strings consisting of some number of a
's followed by an equal number of b
's. Examples of strings in this language are ab
, aabb
, aaabbb
.
这是所有非空字符串的语言,由一定数量的a
's 后跟相同数量的b
's 组成。这种语言中的字符串示例有ab
, aabb
, aaabbb
。
This language can be show to be non-regular by the pumping lemma. It is in fact an archetypal context-free language, which can be generated by the context-free grammarS → aSb | ab
.
这种语言可以通过泵引理显示为非正则。它实际上是一种原型上下文无关语言,可以由上下文无关文法生成S → aSb | ab
。
Nonetheless, modern day regex implementations clearly recognize more than just regular languages. That is, they are not "regular" by formal language theory definition. PCRE and Perl supports recursive regex, and .NET supports balancing groups definition. Even less "fancy" features, e.g. backreference matching, means that regex is not regular.
尽管如此,现代正则表达式实现清楚地识别出的不仅仅是常规语言。也就是说,它们不是形式语言理论定义的“常规”。PCRE 和 Perl 支持递归正则表达式,.NET 支持平衡组定义。甚至不那么“花哨”的功能,例如反向引用匹配,意味着正则表达式不是常规的。
But just how powerful is this "basic" features? Can we recognize L
with Java regex, for example? Can we perhaps combine lookarounds and nested references and have a pattern that works with e.g. String.matches
to match strings like ab
, aabb
, aaabbb
, etc?
但这个“基本”功能到底有多强大?L
例如,我们可以使用 Java 正则表达式识别吗?我们也许可以结合lookarounds和嵌套引用,并有一个模式,与如作品String.matches
来匹配字符串一样ab
,aabb
,aaabbb
,等?
References
参考
- perlfaq6: Can I use Perl regular expressions to match balanced text?
- MSDN - Regular Expression Language Elements - Balancing Group Definitions
- pcre.org - PCRE man page
- regular-expressions.info- Lookaroundsand Grouping and Backreferences
java.util.regex.Pattern
- perlfaq6:我可以使用 Perl 正则表达式来匹配平衡文本吗?
- MSDN - 正则表达式语言元素 - 平衡组定义
- pcre.org - PCRE 手册页
- regular-expressions.info- Lookarounds和分组和反向引用
java.util.regex.Pattern
Linked questions
相关问题
回答by polygenelubricants
The answer is, needless to say, YES!You can most certainly write a Java regex pattern to match anbn. It uses a positive lookahead for assertion, and one nested reference for "counting".
答案是,不用说,是的!你可以肯定写一个Java正则表达式匹配一个ñb ñ。它对断言使用正向前瞻,对“计数”使用一个嵌套引用。
Rather than immediately giving out the pattern, this answer will guide readers through the processof deriving it. Various hints are given as the solution is slowly constructed. In this aspect, hopefully this answer will contain much more than just another neat regex pattern. Hopefully readers will also learn how to "think in regex", and how to put various constructs harmoniously together, so they can derive more patterns on their own in the future.
这个答案不会立即给出模式,而是会引导读者完成推导它的过程。随着解决方案的慢慢构建,给出了各种提示。在这方面,希望这个答案不仅仅包含另一个简洁的正则表达式模式。希望读者也能学会如何“在正则表达式中思考”,如何将各种结构和谐地组合在一起,以便日后自己推导出更多的模式。
The language used to develop the solution will be PHP for its conciseness. The final test once the pattern is finalized will be done in Java.
用于开发解决方案的语言将是 PHP,因为它的简洁性。模式完成后的最终测试将在 Java 中完成。
Step 1: Lookahead for assertion
第 1 步:前瞻断言
Let's start with a simpler problem: we want to match a+
at the beginning of a string, but only if it's followed immediately by b+
. We can use ^
to anchorour match, and since we only want to match the a+
without the b+
, we can use lookaheadassertion (?=…)
.
让我们从一个更简单的问题开始:我们想a+
在字符串的开头匹配,但前提是它后面紧跟b+
. 我们可以使用^
来锚定我们的匹配,因为我们只想匹配a+
没有 的b+
,我们可以使用先行断言(?=…)
。
Here is our pattern with a simple test harness:
这是我们使用简单测试工具的模式:
function testAll($r, $tests) {
foreach ($tests as $test) {
$isMatch = preg_match($r, $test, $groups);
$groupsJoined = join('|', $groups);
print("$test $isMatch $groupsJoined\n");
}
}
$tests = array('aaa', 'aaab', 'aaaxb', 'xaaab', 'b', 'abbb');
$r1 = '/^a+(?=b+)/';
# └────┘
# lookahead
testAll($r1, $tests);
The output is (as seen on ideone.com):
输出是(如在 ideone.com 上看到的):
aaa 0
aaab 1 aaa
aaaxb 0
xaaab 0
b 0
abbb 1 a
This is exactly the output we want: we match a+
, only if it's at the beginning of the string, and only if it's immediately followed by b+
.
这正是我们想要的输出:我们匹配a+
,仅当它位于字符串的开头,并且仅当它紧跟其后b+
。
Lesson: You can use patterns in lookarounds to make assertions.
课程:您可以在环视中使用模式来做出断言。
Step 2: Capturing in a lookahead (and f r e e - s p a c i n g mode)
第 2 步:在前瞻中捕获(和自由间距模式)
Now let's say that even though we don't want the b+
to be part of the match, we do want to captureit anyway into group 1. Also, as we anticipate having a more complicated pattern, let's use x
modifier for free-spacingso we can make our regex more readable.
现在让我们说,即使我们不希望b+
成为比赛的一部分,我们仍然希望将其捕获到组 1 中。此外,由于我们预计会有更复杂的模式,让我们使用x
修饰符进行自由间距,因此我们可以使我们的正则表达式更具可读性。
Building on our previous PHP snippet, we now have the following pattern:
在我们之前的 PHP 代码段的基础上,我们现在有以下模式:
$r2 = '/ ^ a+ (?= (b+) ) /x';
# │ └──┘ │
# │ 1 │
# └────────┘
# lookahead
testAll($r2, $tests);
The output is now (as seen on ideone.com):
输出现在是(如 ideone.com 上所见):
aaa 0
aaab 1 aaa|b
aaaxb 0
xaaab 0
b 0
abbb 1 a|bbb
Note that e.g. aaa|b
is the result of join
-ing what each group captured with '|'
. In this case, group 0 (i.e. what the pattern matched) captured aaa
, and group 1 captured b
.
请注意, egaaa|b
是join
-ing 每个组用 捕获的结果'|'
。在这种情况下,第 0 组(即模式匹配的内容)被捕获aaa
,第 1 组被捕获b
。
Lesson: You can capture inside a lookaround. You can use free-spacing to enhance readability.
课程:您可以在环视中捕捉。您可以使用自由间距来增强可读性。
Step 3: Refactoring the lookahead into the "loop"
第 3 步:将前瞻重构为“循环”
Before we can introduce our counting mechanism, we need to do one modification to our pattern. Currently, the lookahead is outside of the +
repetition "loop". This is fine so far because we just wanted to assert that there's a b+
following our a+
, but what we reallywant to do eventually is assert that for each a
that we match inside the "loop", there's a corresponding b
to go with it.
在我们介绍我们的计数机制之前,我们需要对我们的模式做一个修改。目前,前瞻在+
重复“循环”之外。到目前为止这很好,因为我们只是想断言有一个b+
跟随我们的a+
,但我们最终真正想做的是断言对于a
我们在“循环”内匹配的每个,都有一个对应的b
。
Let's not worry about the counting mechanism for now and just do the refactoring as follows:
暂时不用担心计数机制,只需进行如下重构:
- First refactor
a+
to(?: a )+
(note that(?:…)
is a non-capturing group) - Then move the lookahead inside this non-capturing group
- Note that we must now "skip"
a*
before we can "see" theb+
, so modify the pattern accordingly
- Note that we must now "skip"
- 首先重构
a+
为(?: a )+
(注意(?:…)
是非捕获组) - 然后在这个非捕获组内移动前瞻
- 请注意,我们现在必须“跳过”
a*
才能“看到”b+
,因此相应地修改模式
- 请注意,我们现在必须“跳过”
So we now have the following:
所以我们现在有以下内容:
$r3 = '/ ^ (?: a (?= a* (b+) ) )+ /x';
# │ │ └──┘ │ │
# │ │ 1 │ │
# │ └───────────┘ │
# │ lookahead │
# └───────────────────┘
# non-capturing group
The output is the same as before (as seen on ideone.com), so there's no change in that regard. The important thing is that now we are making the assertion at every iterationof the +
"loop". With our current pattern, this is not necessary, but next we'll make group 1 "count" for us using self-reference.
输出与以前相同(如在 ideone.com 上看到的),因此在这方面没有变化。重要的是,现在我们在“循环”的每次迭代中进行断言+
。对于我们当前的模式,这不是必需的,但接下来我们将使用自我引用为我们“计数”组 1。
Lesson: You can capture inside a non-capturing group. Lookarounds can be repeated.
课程:您可以在非捕获组内进行捕获。环视可以重复。
Step 4: This is the step where we start counting
第 4 步:这是我们开始计数的步骤
Here's what we're going to do: we'll rewrite group 1 such that:
这是我们要做的:我们将重写第 1 组,使其:
- At the end of the first iteration of the
+
, when the firsta
is matched, it should captureb
- At the end of the second iteration, when another
a
is matched, it should capturebb
- At the end of the third iteration, it should capture
bbb
- ...
- At the end of the n-th iteration, group 1 should capture bn
- If there aren't enough
b
to capture into group 1 then the assertion simply fails
- 在 的第一次迭代结束
+
时,当第一个a
匹配时,它应该捕获b
- 在第二次迭代结束时,当另一个
a
匹配时,它应该捕获bb
- 在第三次迭代结束时,它应该捕获
bbb
- ...
- 在第n次迭代结束时,第 1 组应捕获b n
- 如果没有足够的数据
b
可以捕获到组 1 中,那么断言就会失败
So group 1, which is now (b+)
, will have to be rewritten to something like (\1 b)
. That is, we try to "add" a b
to what group 1 captured in the previous iteration.
因此,现在是 1 组,(b+)
必须将其重写为类似(\1 b)
. 也就是说,我们尝试将 a 添加b
到前一次迭代中捕获的组 1。
There's a slight problem here in that this pattern is missing the "base case", i.e. the case where it can match without the self-reference. A base case is required because group 1 starts "uninitialized"; it hasn't captured anything yet (not even an empty string), so a self-reference attempt will always fail.
这里有一个小问题,因为这个模式缺少“基本情况”,即它可以在没有自引用的情况下进行匹配的情况。需要一个基本情况,因为第 1 组开始“未初始化”;它尚未捕获任何内容(甚至不是空字符串),因此自引用尝试将始终失败。
There are many ways around this, but for now let's just make the self-reference matching optional, i.e. \1?
. This may or may not work perfectly, but let's just see what that does, and if there's any problem then we'll cross that bridge when we come to it. Also, we'll add some more test cases while we're at it.
有很多方法可以解决这个问题,但现在让我们将自引用匹配设为可选,即\1?
. 这可能会也可能不会完美地工作,但让我们看看它的作用,如果有任何问题,那么当我们来到它时我们会越过那座桥。此外,我们将在此过程中添加更多测试用例。
$tests = array(
'aaa', 'aaab', 'aaaxb', 'xaaab', 'b', 'abbb', 'aabb', 'aaabbbbb', 'aaaaabbb'
);
$r4 = '/ ^ (?: a (?= a* (? b) ) )+ /x';
# │ │ └─────┘ | │
# │ │ 1 | │
# │ └──────────────┘ │
# │ lookahead │
# └──────────────────────┘
# non-capturing group
The output is now (as seen on ideone.com):
输出现在是(如 ideone.com 上所见):
aaa 0
aaab 1 aaa|b # (*gasp!*)
aaaxb 0
xaaab 0
b 0
abbb 1 a|b # yes!
aabb 1 aa|bb # YES!!
aaabbbbb 1 aaa|bbb # YESS!!!
aaaaabbb 1 aaaaa|bb # NOOOOOoooooo....
A-ha! It looks like we're really close to the solution now! We managed to get group 1 to "count" using self-reference! But wait... something is wrong with the second and the last test cases!! There aren't enough b
s, and somehow it counted wrong! We'll examine why this happened in the next step.
啊哈!看起来我们现在真的很接近解决方案了!我们设法使用自我参考让第 1 组“计数”!但是等等......第二个也是最后一个测试用例有问题!!没有足够的b
s,不知怎么算错了!我们将在下一步中研究为什么会发生这种情况。
Lesson: One way to "initialize" a self-referencing group is to make the self-reference matching optional.
课程:“初始化”自引用组的一种方法是使自引用匹配成为可选。
Step 4?: Understanding what went wrong
第 4 步?:了解出了什么问题
The problem is that since we made the self-reference matching optional, the "counter" can "reset" back to 0 when there aren't enough b
's. Let's closely examine what happens at every iteration of our pattern with aaaaabbb
as input.
问题是,由于我们将自引用匹配设为可选,当没有足够的b
'时,“计数器”可以“重置”回 0 。让我们仔细检查在我们的模式每次迭代中使用aaaaabbb
as 输入时会发生什么。
a a a a a b b b
↑
# Initial state: Group 1 is "uninitialized".
_
a a a a a b b b
↑
# 1st iteration: Group 1 couldn't match since it was "uninitialized",
# so it matched and captured just b
___
a a a a a b b b
↑
# 2nd iteration: Group 1 matched b and captured bb
_____
a a a a a b b b
↑
# 3rd iteration: Group 1 matched b and captured bbb
_
a a a a a b b b
↑
# 4th iteration: Group 1 could still match , but not b,
# (!!!) so it matched and captured just b
___
a a a a a b b b
↑
# 5th iteration: Group 1 matched b and captured bb
#
# No more a, + "loop" terminates
A-ha! On our 4th iteration, we could still match \1
, but we couldn't match \1b
! Since we allow the self-reference matching to be optional with \1?
, the engine backtracks and took the "no thanks" option, which then allows us to match and capture just b
!
啊哈!在我们的第 4 次迭代中,我们仍然可以匹配\1
,但我们无法匹配\1b
!由于我们允许自引用匹配是可选的\1?
,引擎回溯并采用“不,谢谢”选项,然后允许我们匹配和捕获b
!
Do note, however, that except on the very first iteration, you could always match just the self-reference \1
. This is obvious, of course, since it's what we just captured on our previous iteration, and in our setup we can always match it again (e.g. if we captured bbb
last time, we're guaranteed that there will still be bbb
, but there may or may not be bbbb
this time).
但是请注意,除了第一次迭代外,您始终可以只匹配 self-reference \1
。当然,这是显而易见的,因为这是我们在上一次迭代中刚刚捕获的内容,并且在我们的设置中,我们始终可以再次匹配它(例如,如果我们bbb
上次捕获,我们保证仍然会有bbb
,但是可能或bbbb
这次可能不是)。
Lesson: Beware of backtracking. The regex engine will do as much backtracking as you allow until the given pattern matches. This may impact performance (i.e. catastrophic backtracking) and/or correctness.
教训:当心回溯。正则表达式引擎将尽可能多地进行回溯,直到给定的模式匹配为止。这可能会影响性能(即灾难性回溯)和/或正确性。
Step 5: Self-possession to the rescue!
第 5 步:自我占有来拯救!
The "fix" should now be obvious: combine optional repetition with possessivequantifier. That is, instead of simply ?
, use ?+
instead (remember that a repetition that is quantified as possessive does not backtrack, even if such "cooperation" may result in a match of the overall pattern).
“修复”现在应该是显而易见的:将可选的重复与所有格量词结合起来。也就是说,不是简单的?
,?+
而是使用(请记住,被量化为所有格的重复不会回溯,即使这种“合作”可能导致整体模式的匹配)。
In very informal terms, this is what ?+
, ?
and ??
says:
用非常非正式的术语来说,这就是?+
,?
并??
说:
?+
- (optional) "It doesn't have to be there,"
- (possessive) "but if it is there, you must take it and not let go!"
?
- (optional) "It doesn't have to be there,"
- (greedy) "but if it is you can take it for now,"
- (backtracking) "but you may be asked to let it go later!"
??
- (optional) "It doesn't have to be there,"
- (reluctant) "and even if it is you don't have to take it just yet,"
- (backtracking) "but you may be asked to take it later!"
?+
- (可选)“它不必在那里,”
- (占有)“但如果它在那里,你必须抓住它,不要放手!”
?
- (可选)“它不必在那里,”
- (贪婪)“但如果是的话,你可以暂时接受,”
- (回溯)“但你可能会被要求稍后放手!”
??
- (可选)“它不必在那里,”
- (不情愿)“即使是你也不必接受它,”
- (回溯)“但你可能会被要求稍后服用!”
In our setup, \1
will not be there the very first time, but it will alwaysbe there any time after that, and we alwayswant to match it then. Thus, \1?+
would accomplish exactly what we want.
在我们的设置,\1
不会在那里的第一次,但它会永远在那里后,任何时间,我们总是想那么匹配。因此,\1?+
将完成我们想要的。
$r5 = '/ ^ (?: a (?= a* (?+ b) ) )+ /x';
# │ │ └──────┘ │ │
# │ │ 1 │ │
# │ └───────────────┘ │
# │ lookahead │
# └───────────────────────┘
# non-capturing group
Now the output is (as seen on ideone.com):
现在输出是(如在 ideone.com 上看到的):
aaa 0
aaab 1 a|b # Yay! Fixed!
aaaxb 0
xaaab 0
b 0
abbb 1 a|b
aabb 1 aa|bb
aaabbbbb 1 aaa|bbb
aaaaabbb 1 aaa|bbb # Hurrahh!!!
Voilà!!! Problem solved!!! We are now counting properly, exactly the way we want it to!
瞧!!!问题解决了!!!我们现在正在正确地计数,正是我们想要的方式!
Lesson: Learn the difference between greedy, reluctant, and possessive repetition. Optional-possessive can be a powerful combination.
课程: 了解贪婪、不情愿和占有性重复之间的区别。Optional-possessive 可以是一个强大的组合。
Step 6: Finishing touches
第 6 步:收尾工作
So what we have right now is a pattern that matches a
repeatedly, and for every a
that was matched, there is a corresponding b
captured in group 1. The +
terminates when there are no more a
, or if the assertion failed because there isn't a corresponding b
for an a
.
所以我们现在拥有的是一个a
重复匹配的模式,对于每个匹配的模式,在组 1 中a
都有一个对应的b
捕获。+
当没有更多的时候终止a
,或者如果断言失败,因为没有对应的b
for一个a
。
To finish the job, we simply need to append to our pattern \1 $
. This is now a back reference to what group 1 matched, followed by the end of the line anchor. The anchor ensures that there aren't any extra b
's in the string; in other words, that in fact we have anbn.
为了完成这项工作,我们只需要附加到我们的模式\1 $
。现在这是对第 1 组匹配内容的反向引用,后跟行锚点的结尾。锚点确保b
字符串中没有任何额外的's;换句话说,事实上我们有一个nb n。
Here's the finalized pattern, with additional test cases, including one that's 10,000 characters long:
这是最终的模式,带有额外的测试用例,包括一个长度为 10,000 个字符的测试用例:
$tests = array(
'aaa', 'aaab', 'aaaxb', 'xaaab', 'b', 'abbb', 'aabb', 'aaabbbbb', 'aaaaabbb',
'', 'ab', 'abb', 'aab', 'aaaabb', 'aaabbb', 'bbbaaa', 'ababab', 'abc',
str_repeat('a', 5000).str_repeat('b', 5000)
);
$r6 = '/ ^ (?: a (?= a* (?+ b) ) )+ $ /x';
# │ │ └──────┘ │ │
# │ │ 1 │ │
# │ └───────────────┘ │
# │ lookahead │
# └───────────────────────┘
# non-capturing group
It finds 4 matches: ab
, aabb
, aaabbb
, and the a5000b5000. It takes only 0.06s to run on ideone.com.
它找到 4 个匹配项:ab
、aabb
、aaabbb
和a 5000b 5000。在ideone.com 上运行只需 0.06 秒。
Step 7: The Java test
第 7 步:Java 测试
So the pattern works in PHP, but the ultimate goal is to write a pattern that works in Java.
所以该模式在 PHP 中有效,但最终目标是编写一个在 Java 中有效的模式。
public static void main(String[] args) {
String aNbN = "(?x) (?: a (?= a* (\1?+ b)) )+ \1";
String[] tests = {
"", // false
"ab", // true
"abb", // false
"aab", // false
"aabb", // true
"abab", // false
"abc", // false
repeat('a', 5000) + repeat('b', 4999), // false
repeat('a', 5000) + repeat('b', 5000), // true
repeat('a', 5000) + repeat('b', 5001), // false
};
for (String test : tests) {
System.out.printf("[%s]%n %s%n%n", test, test.matches(aNbN));
}
}
static String repeat(char ch, int n) {
return new String(new char[n]).replace('function is_anbn($s) {
return (preg_match('/^(a+)(b+)$/', $s, $groups)) &&
(strlen($groups[1]) == strlen($groups[2]));
}
', ch);
}
The pattern works as expected (as seen on ideone.com).
该模式按预期工作(如 ideone.com 所示)。
And now we come to the conclusion...
现在我们得出结论......
It needs to be said that the a*
in the lookahead, and indeed the "main +
loop", both permit backtracking. Readers are encouraged to confirm why this is not a problem in terms of correctness, and why at the same time making both possessive would also work (though perhaps mixing mandatory and non-mandatory possessive quantifier in the same pattern may lead to misperceptions).
需要说的是,a*
在前瞻中,实际上在“主+
循环”中,都允许回溯。鼓励读者确认为什么这在正确性方面不是问题,以及为什么同时使两个所有格都有效(尽管在同一模式中混合强制性和非强制性所有格可能会导致误解)。
It should also be said that while it's neat that there's a regex pattern that will match anbn, this is in not always the "best" solution in practice. A much better solution is to simply match ^(a+)(b+)$
, and then compare the length of the strings captured by groups 1 and 2 in the hosting programming language.
还应该说,虽然它的整洁,有一个正则表达式将匹配一个ñb ñ,这是在实践中并不总是“最佳”的解决方案。更好的解决方案是简单地匹配^(a+)(b+)$
,然后比较托管编程语言中第 1 组和第 2 组捕获的字符串的长度。
In PHP, it may look something like this (as seen in ideone.com):
在 PHP 中,它可能看起来像这样(如 ideone.com 中所示):
$rRecursive = '/ ^ (a (?1)? b) $ /x';
The purpose of this article is NOTto convince readers that regex can do almost anything; it clearly can't, and even for the things it can do, at least partial delegation to the hosting language should be considered if it leads to a simpler solution.
这篇文章的目的不是要让读者相信正则表达式几乎可以做任何事情;它显然不能,即使对于它可以做的事情,如果它导致更简单的解决方案,至少应该考虑将部分委派给托管语言。
As mentioned at the top, while this article is necessarily tagged [regex]
for stackoverflow, it is perhaps about more than that. While certainly there's value in learning about assertions, nested references, possessive quantifier, etc, perhaps the bigger lesson here is the creative process by which one can try to solve problems, the determination and hard work that it often requires when you're subjected to various constraints, the systematic composition from various parts to build a working solution, etc.
正如顶部所提到的,虽然这篇文章必须标记[regex]
为 stackoverflow,但它可能不止于此。虽然学习断言、嵌套引用、所有格量词等当然是有价值的,但也许这里更大的教训是人们可以尝试解决问题的创造性过程,当你受到影响时,它通常需要决心和努力工作各种约束,从各个部分的系统组合以构建工作解决方案等。
Bonus material! PCRE recursive pattern!
奖励材料!PCRE递归模式!
Since we did bring up PHP, it needs to be said that PCRE supports recursive pattern and subroutines. Thus, following pattern works for preg_match
(as seen on ideone.com):
既然我们提出了PHP,就需要说PCRE支持递归模式和子程序。因此,以下模式适用于preg_match
(如 ideone.com 所示):
/^(a(?1)?b)$/
Currently Java's regex does not support recursive pattern.
目前 Java 的正则表达式不支持递归模式。
Even more bonus material! Matching anbncn!!
更多的奖励材料!匹配一个nb nc n!!
So we've seen how to match anbnwhich is non-regular, but still context-free, but can we also match anbncn, which isn't even context-free?
因此,我们已经看到了如何匹配一个ñb ñ这是不正规,但还是上下文,但我们也匹配一个ñb ñç ñ,这甚至不是上下文无关?
The answer is, of course, YES!Readers are encouraged to try to solve this on their own, but the solution is provided below (with implementation in Java on ideone.com).
答案当然是肯定的!鼓励读者尝试自己解决这个问题,但下面提供了解决方案(在 ideone.com 上用Java 实现)。
^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $
^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $
回答by jaytea
Given that no mention has been made of PCRE supporting recursive patterns, I'd just like to point out the simplest and most efficient example of PCRE that describes the language in question:
鉴于没有提到 PCRE 支持递归模式,我只想指出描述所讨论语言的 PCRE 的最简单和最有效的例子:
^
(?<A>a)+
(?<B-A>b)+ (?(A)(?!))
(?<C-B>c)+ (?(B)(?!))
...
(?<Z-Y>z)+ (?(Y)(?!))
$
回答by kennytm
As mentioned in the question — with .NET balancing group, the patterns of the type anbncndn…zncan be matched easily as
正如问题中提到的——使用 .NET 平衡组,类型a nb nc nd n…z n 的模式可以很容易地匹配为
^
(?=(a(?-1)?b)) a+
(?=(b(?-1)?c)) b+
...
(?=(x(?-1)?y)) x+
(y(?-1)?z)
$
For example: http://www.ideone.com/usuOE
例如:http: //www.ideone.com/usuOE
Edit:
编辑:
There is also a PCRE pattern for the generalized language with recursive pattern, but a lookahead is needed. I don't think this is a direct translation of the above.
对于具有递归模式的广义语言,还有一个 PCRE 模式,但需要先行。我不认为这是对上述内容的直接翻译。
##代码##For example: http://www.ideone.com/9gUwF