C# 什么是“嵌套量词”,为什么它会导致我的正则表达式失败?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/210206/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is a "Nested Quantifier" and why is it causing my regex to fail?
提问by ctrlShiftBryan
I have this regex I built and tested in regex buddy.
我有我在 regex buddy 中构建和测试的这个正则表达式。
"_ [ 0-9]{10}+ {1}+[ 0-9]{10}+ {2}+[ 0-9]{6}+ {2}[ 0-9]{2}"
When I use this in .Net C#
当我在 .Net C# 中使用它时
I receive the exception
我收到异常
"parsing \"_ [ 0-9]{10}+ +[ 0-9]{10}+ +[ 0-9]{6}+ [ 0-9]{2}\" - Nested quantifier +."
What does this error mean? Apparently .net doesn't like the expression.
这个错误是什么意思?显然 .net 不喜欢这种表达方式。
Here is the regex buddy so u can understand my intention with the regex...
这是正则表达式好友,因此您可以通过正则表达式了解我的意图......
_ [ 0-9]{10}+ {1}+[ 0-9]{10}+ {2}+[ 0-9]{6}+ {2}[ 0-9]{2}
Match the characters "_ " literally ?_ ?
Match a single character present in the list below ?[ 0-9]{10}+?
Exactly 10 times ?{10}+?
The character " " ? ?
A character in the range between "0" and "9" ?0-9?
Match the character " " literally ? {1}+?
Exactly 1 times ?{1}+?
Match a single character present in the list below ?[ 0-9]{10}+?
Exactly 10 times ?{10}+?
The character " " ? ?
A character in the range between "0" and "9" ?0-9?
Match the character " " literally ? {2}+?
Exactly 2 times ?{2}+?
Match a single character present in the list below ?[ 0-9]{6}+?
Exactly 6 times ?{6}+?
The character " " ? ?
A character in the range between "0" and "9" ?0-9?
Match the character " " literally ? {2}?
Exactly 2 times ?{2}?
Match a single character present in the list below ?[ 0-9]{2}?
Exactly 2 times ?{2}?
The character " " ? ?
A character in the range between "0" and "9" ?0-9?
In short...
简而言之...
What is a Nested quantifier?
什么是嵌套量词?
采纳答案by Duncan
.NET is complaining about the +
after the {n}
style quantifier as it doesn't make any sense. {n}
means match exactly n of a given group. +
means match one or more of a given group. Remove the +
's and it'll compile fine.
.NET 抱怨+
在{n}
样式量词之后,因为它没有任何意义。 {n}
表示与给定组的 n 完全匹配。 +
表示匹配一个或多个给定组。删除+
's,它会编译得很好。
"_ [ 0-9]{10} {1}[ 0-9]{10} {2}[ 0-9]{6} {2}[ 0-9]{2}"
回答by Duncan
They're right. This version of your regex doesn't fail:
他们是对的。此版本的正则表达式不会失败:
(_ [ 0-9]{10})+(\s{1})+([ 0-9]{10})+(\s{2})+([ 0-9]{6})+\s{2}[ 0-9]{2}
(_ [ 0-9]{10})+(\s{1})+([ 0-9]{10})+(\s{2})+([ 0-9]{6})+\s{2}[ 0-9]{2}
Notice the use of parens to create groups that then can repeat one or more times. Also, you should be more specific and use \s instead of a space, as pattern whitespace may or may not have significance.
请注意使用括号创建可以重复一次或多次的组。此外,您应该更具体并使用 \s 而不是空格,因为模式空白可能有也可能没有意义。
BTW, this regex doesn't look all that useful. You might want to ask another question along the lines of "How do I use regex to match this pattern?"
顺便说一句,这个正则表达式看起来并不是那么有用。您可能想问另一个问题,如“我如何使用正则表达式来匹配此模式?”
回答by stevemegson
.NET doesn't support the possessive quantifier
.NET 不支持所有格量词
{10}+
However, {10} should have exactly the same effect. The + avoids backtracking and trying shorter matches if the longest match fails, but since {10} can only match exactly 10 characters to start with this doesn't achieve much.
但是,{10} 应该具有完全相同的效果。如果最长的匹配失败,+ 可以避免回溯和尝试更短的匹配,但由于 {10} 只能匹配 10 个字符,因此这并没有多大效果。
"_ [ 0-9]{10} [ 0-9]{10} {2}[ 0-9]{6} {2}[ 0-9]{2}"
should be fine. I've also dropped the "{1}+" bit .Since it matches exactly once, "A{1}+" is equivalent to just "A".
应该没事。我还去掉了“{1}+”位。因为它只匹配一次,所以“A{1}+”就等于“A”。
EDITAs Porges says, if you do need possessive quantifiers in .NET, then atomic groups give the same functionality with (?>[0-9]*)
being equivalent to [0-9]*+
.
编辑作为Porges说,如果你确实需要在.NET中占有量词,那么原子团给予相同的功能与(?>[0-9]*)
等价于[0-9]*+
。
回答by Jan Goyvaerts
If you select the .NET flavor in the toolbar at the top in RegexBuddy, RegexBuddy will indicate that .NET does not support possessive quantifiers such as {10}+.
如果您在 RegexBuddy 顶部的工具栏中选择 .NET 风格,RegexBuddy 将指示 .NET 不支持所有格量词,例如 {10}+。
Since {10} allows only for one specific number of repetitions, making it lazy or possessive is pointless, even if it is syntactically valid in the regex flavors that support lazy and/or possessive quantifiers. Removing the + signs from your regex will make it work fine with .NET.
由于 {10} 只允许特定次数的重复,因此使其惰性或所有格是毫无意义的,即使它在支持惰性和/或所有格量词的正则表达式风格中在语法上是有效的。从正则表达式中删除 + 符号将使其在 .NET 中正常工作。
In other situations, double-click on the error about the possessive quantifier in the Create tab in RegexBuddy. RegexBuddy will then replace the possessive quantifier with a functionally equivalent atomic group.
在其他情况下,在 RegexBuddy 的 Create 选项卡中双击有关所有格量词的错误。然后 RegexBuddy 将使用功能等效的原子组替换所有格量词。
If you generate a source code snippet for a .NET language on the Use tab in RegexBuddy, RegexBuddy will automatically replace possessive quantifiers in the regex in the source code snippet.
如果您在 RegexBuddy 的“使用”选项卡上为 .NET 语言生成源代码片段,RegexBuddy 将自动替换源代码片段中正则表达式中的所有格量词。