Python 和 Java 之间 RegEx 语法的差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10492180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 01:17:52  来源:igfitidea点击:

Differences in RegEx syntax between Python and Java

javapythonregex

提问by Vineet

I have a working regex in Python and I am trying to convert to Java. It seems that there is a subtle difference in the implementations.

我在 Python 中有一个可用的正则表达式,我正在尝试转换为 Java。似乎在实现上有细微的差别。

The RegEx is trying to match another reg ex. The RegEx in question is:

RegEx 正在尝试匹配另一个 reg ex。有问题的正则表达式是:

/(\.|[^[/\\n]|\[(\.|[^\]\\n])*])+/([gim]+\b|\B)

One of the strings that it is having problems on is: /\s+/;

它有问题的字符串之一是: /\s+/;

The reg ex is not supposed to be matching the ending ;. In Python the RegEx works correctly (and does not match the ending ;, but in Java it does include the ;.

reg ex 不应该与结尾匹配;。在 Python 中,RegEx 工作正常(并且与结尾不匹配;,但在 Java 中它确实包含;.

The Question(s):

问题:

  1. What can I do to get this RegEx working in Java?
  2. Based on what I read herethere should be no difference for this RegEx. Is there somewhere a list of differences between the RegEx implementations in Python vs Java?
  1. 我该怎么做才能让这个 RegEx 在 Java 中工作?
  2. 根据我在此处阅读的内容,此 RegEx 应该没有区别。Python 与 Java 中的 RegEx 实现之间是否存在差异列表?

回答by Vineet

Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested ['s were causing problems. In Python you don't need to escape any nested [but you do need to do that in Java.

对于一小部分情况,Java 不会以与 Python 相同的方式解析正则表达式。在这种特殊情况下,嵌套的['s 引起了问题。在 Python 中,您不需要转义任何嵌套,[但在 Java 中确实需要这样做。

The original RegEx (for Python):

原始正则表达式(用于 Python):

/(\.|[^[/\\n]|\[(\.|[^\]\\n])*])+/([gim]+\b|\B)

The fixed RegEx (for Java and Python):

固定的 RegEx(适用于 Java 和 Python):

/(\.|[^\[/\\n]|\[(\.|[^\]\\n])*\])+/([gim]+\b|\B)

回答by trutheality

The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.

黑白 Java 和 Python 的明显区别在于,在 Java 中您需要转义很多字符。

Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:

此外,您可能会遇到匹配方法之间的不匹配,而不是实际正则表达式符号的差异:

Given the Java

鉴于 Java

String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
  • Java's matcher.matches()(also Pattern.matches( regex, input )) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by using re.match( regex, input )with a regexthat ends with $.
  • Java's matcher.find()and Python's re.search( regex, input )match any part of the string.
  • Java's matcher.lookingAt()and Python's re.match( regex, input )match the beginning of the string.
  • Java 的matcher.matches()(也Pattern.matches( regex, input ))匹配整个字符串。它在 Python 中没有直接的等价物。通过使用re.match( regex, input )regex结尾的a可以实现相同的结果$
  • Javamatcher.find()和 Pythonre.search( regex, input )匹配字符串的任何部分。
  • Javamatcher.lookingAt()和 Pythonre.match( regex, input )匹配字符串的开头。

For more details also read Java's documentation of Matcherand compare to the Python documentation.

有关更多详细信息,还可以阅读 Java 的文档Matcher并与Python 文档进行比较。

Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61TIt looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;). Your problem is elsewhere.

既然你说这不是问题,我决定做一个测试:http: //ideone.com/6w61T看起来 java 正在做你需要它做的事情(组 0,整个比赛,不包含的;)。你的问题在别处。