Java 和 C# 正则表达式兼容吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/538579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Are Java and C# regular expressions compatible?
提问by TREE
Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular expression syntaxes differ?
这两种语言都声称使用 Perl 风格的正则表达式。如果我用一种语言测试正则表达式的有效性,它是否适用于另一种语言?正则表达式语法有何不同?
The use case here is a C# (.NET) UI talking to an eventual Java back end implementation that will use the regex to match data.
这里的用例是一个 C# (.NET) UI,它与最终的 Java 后端实现对话,后者将使用正则表达式来匹配数据。
Note that I only need to worry about matching, not about extracting portions of the matched data.
请注意,我只需要担心匹配,而不是提取匹配数据的部分。
采纳答案by Drew Noakes
There are quite (a lot of) differences.
有相当(很多)差异。
Character Class
字符类
- Character classes subtraction
[abc-[cde]]
- .NET YES (2.0)
- Java: Emulated via character class intersection and negation:
[abc&&[^cde]]
)
- Character classes intersection
[abc&&[cde]]
- .NET: Emulated via character class subtraction and negation:
[abc-[^cde]]
) - Java YES
- .NET: Emulated via character class subtraction and negation:
\p{Alpha}
POSIX character class- .NET NO
- Java YES (US-ASCII)
- Under
(?x)
modeCOMMENTS
/IgnorePatternWhitespace
, space (U+0020) in character class is significant.- .NET YES
- Java NO
- Unicode Category(L, M, N, P, S, Z, C)
- .NET YES:
\p{L}
form only - Java YES:
- From Java 5:
\pL
,\p{L}
,\p{IsL}
- From Java 7:
\p{general_category=L}
,\p{gc=L}
- From Java 5:
- .NET YES:
- Unicode Category(Lu, Ll, Lt, ...)
- .NET YES:
\p{Lu}
form only - Java YES:
- From Java 5:
\p{Lu}
,\p{IsLu}
- From Java 7:
\p{general_category=Lu}
,\p{gc=Lu}
- From Java 5:
- .NET YES:
- Unicode Block
- .NET YES:
\p{IsBasicLatin}
only. (Supported Named Blocks) - Java YES: (name of the block is free-casing)
- From Java 5:
\p{InBasicLatin}
- From Java 7:
\p{block=BasicLatin}
,\p{blk=BasicLatin}
- From Java 5:
- .NET YES:
- Spaces, and underscores allowed in all long block names (e.g.
BasicLatin
can be written asBasic_Latin
orBasic Latin
)- .NET NO
- Java YES(Java 5)
- 字符类减法
[abc-[cde]]
- .NET是 (2.0)
- Java的:通过字符类交集和否定仿真的:
[abc&&[^cde]]
)
- 字符类交集
[abc&&[cde]]
- .NET:通过字符类减法和否定的仿:
[abc-[^cde]]
) - Java的YES
- .NET:通过字符类减法和否定的仿:
\p{Alpha}
POSIX 字符类- .NET否
- Java是 (US-ASCII)
- 在
(?x)
模式COMMENTS
/ 下IgnorePatternWhitespace
,字符类中的空格 (U+0020) 是重要的。- .NET是
- Java的NO
- Unicode 类别(L、M、N、P、S、Z、C)
- .NET是:
\p{L}
仅表单 - 爪哇是:
- 从 Java 5:
\pL
,\p{L}
,\p{IsL}
- 从 Java 7:
\p{general_category=L}
,\p{gc=L}
- 从 Java 5:
- .NET是:
- Unicode 类别(Lu, Ll, Lt, ...)
- .NET是:
\p{Lu}
仅表单 - 爪哇是:
- 从 Java 5:
\p{Lu}
,\p{IsLu}
- 从 Java 7:
\p{general_category=Lu}
,\p{gc=Lu}
- 从 Java 5:
- .NET是:
- Unicode 块
- .NET是:
\p{IsBasicLatin}
仅。(支持的命名块) - Java YES:(块的名称是自由套管)
- 从 Java 5:
\p{InBasicLatin}
- 从 Java 7:
\p{block=BasicLatin}
,\p{blk=BasicLatin}
- 从 Java 5:
- .NET是:
- 所有长块名称中都允许使用空格和下划线(例如
BasicLatin
可以写为Basic_Latin
或Basic Latin
)- .NET否
- Java是(Java 5)
Quantifier
量词
?+
,*+
,++
and{m,n}+
(possessive quantifiers)- .NET NO
- Java YES
?+
,*+
,++
and{m,n}+
(所有格量词)- .NET否
- Java的YES
Quotation
引述
\Q...\E
escapes a string of metacharacters- .NET NO
- Java YES
\Q...\E
escapes a string of character class metacharacters (in character sets)- .NET NO
- Java YES
\Q...\E
转义一串元字符- .NET否
- Java的YES
\Q...\E
转义一串字符类元字符(在字符集中)- .NET否
- Java的YES
Matching construct
匹配构造
- Conditional matching
(?(?=regex)then|else)
,(?(regex)then|else)
,(?(1)then|else)
or(?(group)then|else)
- .NET YES
- Java NO
- Named capturing group and named backreference
- .NET YES:
- Capturing group:
(?<name>regex)
or(?'name'regex)
- Backreference:
\k<name>
or\k'name'
- Capturing group:
- Java YES(Java 7):
- Capturing group:
(?<name>regex)
- Backreference:
\k<name>
- Capturing group:
- .NET YES:
- Multiple capturing groups can have the same name
- .NET YES
- Java NO(Java 7)
- Balancing group definition
(?<name1-name2>regex)
or(?'name1-name2'subexpression)
- .NET YES
- Java NO
- 条件匹配
(?(?=regex)then|else)
,(?(regex)then|else)
,(?(1)then|else)
或(?(group)then|else)
- .NET是
- Java的NO
- 命名捕获组和命名反向引用
- .NET是:
- 捕获组:
(?<name>regex)
或(?'name'regex)
- 反向引用:
\k<name>
或\k'name'
- 捕获组:
- Java是(Java 7):
- 捕获组:
(?<name>regex)
- 反向引用:
\k<name>
- 捕获组:
- .NET是:
- 多个捕获组可以具有相同的名称
- .NET是
- Java否(Java 7)
- 平衡组定义
(?<name1-name2>regex)
或(?'name1-name2'subexpression)
- .NET是
- Java的NO
Assertions
断言
(?<=text)
(positive lookbehind)- .NET Variable-width
- Java Obvious width
(?<!text)
(negative lookbehind)- .NET Variable-width
- Java Obvious width
(?<=text)
(正面回顾)- .NET可变宽度
- Java明显宽度
(?<!text)
(负面回顾)- .NET可变宽度
- Java明显宽度
Mode Options/Flags
模式选项/标志
ExplicitCapture
option(?n)
- .NET YES
- Java NO
ExplicitCapture
选项(?n)
- .NET是
- Java的NO
Miscellaneous
各种各样的
(?#comment)
inline comments- .NET YES
- Java NO
(?#comment)
内嵌评论- .NET是
- Java的NO
References
参考
回答by Rex M
c# regex has its own convention for named groups (?<name>)
. I don't know of any other differences.
c# regex 有自己的命名组约定(?<name>)
。我不知道有什么其他区别。
回答by Brian Rasmussen
.NET Regex supports counting, so you can match nested parentheses which is something you normally cannot do with a regular expression. According to Mastering Regular Expressions that's one of the few implementations to do that, so that could be a difference.
.NET Regex 支持计数,因此您可以匹配嵌套括号,这是您通常无法使用正则表达式执行的操作。根据掌握正则表达式,这是做到这一点的少数实现之一,所以这可能会有所不同。
回答by WolfmanDragon
Java uses standard Perl type regex as well as POSIX regex. Looking at the C# documentation on regexs, it looks like that Java has all of C# regex syntax, but not the other way around.
Java 使用标准 Perl 类型的正则表达式以及 POSIX 正则表达式。查看有关正则表达式的 C# 文档,看起来 Java 具有所有 C# 正则表达式语法,但反之则不然。
Compare them yourself: Java: C#:
EDIT:Currently, no other regex flavor supports Microsoft's version of named capture.
回答by Seth
Check out: http://www.regular-expressions.info/refflavors.htmlPlenty of regex info on that site, and there's a nice chart that details the differences between java & .net.
查看:http: //www.regular-expressions.info/refflavors.html该站点上有大量正则表达式信息,并且有一个很好的图表详细说明了 java 和 .net 之间的差异。
回答by Alexey Yumashin
From my experience:
根据我的经验:
Java 7 regular expressions as compared to .NET 2.0 regular expressions:
与 .NET 2.0 正则表达式相比,Java 7 正则表达式:
Underscore symbol in group names is not supported
Groups with the same name (in the same regular expression) are not supported (although it may be really useful in expressions using "or"!)
Groups having captured nothing have value of
null
and not of an empty stringGroup with index 0 also contains the whole match (same as in .NET) BUT is not included in
groupCount()
Group back referencein replace expressions is also denoted with dollar sign (e.g. $1), but if the same expression contains dollar sign as the end-of-line marker - then the back reference dollar should be escaped (\$), otherwise in Java we get the "illegal group reference" error
End-of-line symbol ($) behaves greedy. Consider, for example, the following expression (Java-string is given): "bla(bla(?:$|\r\n))+)?$". Here the last line of text will be NOT captured! To capture it, we must substitute "$" with "\z".
There is no "Explicit Capture" mode.
Empty string doesn't satisfy the ^.{0}$ pattern.
Symbol "-" must be escaped when used inside square brackets. That is, pattern "[a-z+-]+" doesn't match string "f+g-h" in Java, but it does in .NET. To match in Java, pattern should look as (Java-string is given): "[a-z+\-]+".
不支持组名中的下划线符号
不支持具有相同名称(在相同正则表达式中)的组(尽管它在使用“或”的表达式中可能非常有用!)
未捕获任何内容的组的值为空字符串,
null
而不是空字符串索引为 0 的组也包含整个匹配项(与 .NET 中相同)但不包含在
groupCount()
替换表达式中的组反向引用也用美元符号表示(例如 $1),但如果相同的表达式包含美元符号作为行尾标记 - 那么反向引用美元应该被转义(\$),否则在 Java 中我们收到“非法组引用”错误
行尾符号 ($) 表现贪婪。例如,考虑以下表达式(给出 Java 字符串):“bla(bla(?:$|\r\n))+)?$”。这里将不会捕获最后一行文本!要捕获它,我们必须用“\z”替换“$”。
没有“显式捕获”模式。
空字符串不满足 ^.{0}$ 模式。
符号“-”在方括号内使用时必须转义。也就是说,模式“[a-z+-]+”在 Java 中不匹配字符串“f+gh”,但在 .NET 中匹配。要在 Java 中匹配,模式应如下所示(给出了 Java 字符串):“[a-z+\-]+”。
NOTE: "(Java-string is given)" - just to explain double escapes in the expression.
注意:“(Java-string is given)” - 只是为了解释表达式中的双重转义。