Python 和 Java 之间 RegEx 语法的差异
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10492180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Differences in RegEx syntax between Python and Java
提问by Vineet
I have a working regex in Python and I am trying to convert to Java. It seems that there is a subtle difference in the implementations.
我在 Python 中有一个可用的正则表达式,我正在尝试转换为 Java。似乎在实现上有细微的差别。
The RegEx is trying to match another reg ex. The RegEx in question is:
RegEx 正在尝试匹配另一个 reg ex。有问题的正则表达式是:
/(\.|[^[/\\n]|\[(\.|[^\]\\n])*])+/([gim]+\b|\B)
One of the strings that it is having problems on is: /\s+/;
它有问题的字符串之一是: /\s+/;
The reg ex is not supposed to be matching the ending ;
. In Python the RegEx works correctly (and does not match the ending ;
, but in Java it does include the ;
.
reg ex 不应该与结尾匹配;
。在 Python 中,RegEx 工作正常(并且与结尾不匹配;
,但在 Java 中它确实包含;
.
The Question(s):
问题:
- What can I do to get this RegEx working in Java?
- Based on what I read herethere should be no difference for this RegEx. Is there somewhere a list of differences between the RegEx implementations in Python vs Java?
- 我该怎么做才能让这个 RegEx 在 Java 中工作?
- 根据我在此处阅读的内容,此 RegEx 应该没有区别。Python 与 Java 中的 RegEx 实现之间是否存在差异列表?
回答by Vineet
Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested [
's were causing problems. In Python you don't need to escape any nested [
but you do need to do that in Java.
对于一小部分情况,Java 不会以与 Python 相同的方式解析正则表达式。在这种特殊情况下,嵌套的[
's 引起了问题。在 Python 中,您不需要转义任何嵌套,[
但在 Java 中确实需要这样做。
The original RegEx (for Python):
原始正则表达式(用于 Python):
/(\.|[^[/\\n]|\[(\.|[^\]\\n])*])+/([gim]+\b|\B)
The fixed RegEx (for Java and Python):
固定的 RegEx(适用于 Java 和 Python):
/(\.|[^\[/\\n]|\[(\.|[^\]\\n])*\])+/([gim]+\b|\B)
回答by trutheality
The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.
黑白 Java 和 Python 的明显区别在于,在 Java 中您需要转义很多字符。
Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:
此外,您可能会遇到匹配方法之间的不匹配,而不是实际正则表达式符号的差异:
Given the Java
鉴于 Java
String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
- Java's
matcher.matches()
(alsoPattern.matches( regex, input )
) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by usingre.match( regex, input )
with aregex
that ends with$
. - Java's
matcher.find()
and Python'sre.search( regex, input )
match any part of the string. - Java's
matcher.lookingAt()
and Python'sre.match( regex, input )
match the beginning of the string.
- Java 的
matcher.matches()
(也Pattern.matches( regex, input )
)匹配整个字符串。它在 Python 中没有直接的等价物。通过使用re.match( regex, input )
以regex
结尾的a可以实现相同的结果$
。 - Java
matcher.find()
和 Pythonre.search( regex, input )
匹配字符串的任何部分。 - Java
matcher.lookingAt()
和 Pythonre.match( regex, input )
匹配字符串的开头。
For more details also read Java's documentation of Matcher
and compare to the Python documentation.
有关更多详细信息,还可以阅读 Java 的文档Matcher
并与Python 文档进行比较。
Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61TIt looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;
). Your problem is elsewhere.
既然你说这不是问题,我决定做一个测试:http: //ideone.com/6w61T看起来 java 正在做你需要它做的事情(组 0,整个比赛,不包含的;
)。你的问题在别处。