Java 正则表达式中 \z 和 \Z 之间的区别是什么?何时以及如何使用它?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2707870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Whats the difference between \z and \Z in a regular expression and when and how do I use it?
提问by Mister M. Bean
From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:
从http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:
\Z The end of the input but for the final terminator, if any
\z The end of the input
But what does it mean in practice? Can you give me an example when I use either the \Z or \z.
但这在实践中意味着什么?当我使用 \Z 或 \z 时,你能给我举个例子吗?
In my test I thought that "StackOverflow\n".matches("StackOverflow\\z")
will return true and "StackOverflow\n".matches("StackOverflow\\Z")
returns false. But actually both return false. Where is the mistake?
在我的测试中,我认为这"StackOverflow\n".matches("StackOverflow\\z")
将返回 true 并"StackOverflow\n".matches("StackOverflow\\Z")
返回 false。但实际上两者都返回false。错误在哪里?
回答by Jakob Kruse
Even though
\Z
and$
only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then\Z
and$
will match at the position before that line break, rather than at the very end of the string.This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both
^[a-z]+$
and\A[a-z]+\Z
will match "joe".If you only want a match at the absolute very end of the string, use
\z
(lower case z instead of upper case Z).\A[a-z]+\z
does not match joe\n.\z
matches after the line break, which is not matched by the character class.
即使
\Z
并且$
仅在字符串末尾匹配(当插入符和美元匹配嵌入换行符的选项关闭时),也有一个例外。如果字符串以换行符结尾,则\Z
and$
将匹配该换行符之前的位置,而不是字符串的最末尾。这种“增强”是由 Perl 引入的,并被许多正则表达式风格复制,包括 Java、.NET 和 PCRE。在 Perl 中,当从文件中读取一行时,结果字符串将以换行符结束。从带有文本“joe”的文件中读取一行会产生字符串 joe\n。当应用于此字符串时,
^[a-z]+$
和\A[a-z]+\Z
都将匹配“joe”。如果您只想在字符串的绝对末尾匹配,请使用
\z
(小写 z 而不是大写 Z)。\A[a-z]+\z
与乔不匹配\n。\z
换行符后匹配,字符类不匹配。
http://www.regular-expressions.info/anchors.html
http://www.regular-expressions.info/anchors.html
The way I read this "StackOverflow\n".matches("StackOverflow\\z")
should return false because your pattern does not include the newline.
我读这个的方式"StackOverflow\n".matches("StackOverflow\\z")
应该返回 false 因为你的模式不包括换行符。
"StackOverflow\n".matches("StackOverflow\z\n") => false
"StackOverflow\n".matches("StackOverflow\Z\n") => true
回答by Eyal Schneider
Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:
刚查了一下。看起来当 Matcher.matches() 被调用时(就像在你的代码中,在幕后),\Z 的行为就像 \z。但是,当调用 Matcher.find() 时,它们的行为与预期不同。以下返回真:
Pattern p = Pattern.compile("StackOverflow\Z");
Matcher m = p.matcher("StackOverflow\n");
System.out.println(m.find());
and if you replace \Z with \z it returns false.
如果你用 \z 替换 \Z,它会返回 false。
I find this a little surprising...
我觉得这有点令人惊讶......
回答by Avi
Like Eyal said, it works for find() but not for matches().
就像 Eyal 所说的,它适用于 find() 但不适用于 match()。
This actually makes sense. The \Z anchor itself actually does match the position right before the final eol terminator, but the regular expression as a whole does not match, because, as a whole, it needs to match the entire text being matched, and nothing matches the terminator. (The \Z matches the position right beforethe terminator, which is not the same thing.)
这实际上是有道理的。\Z 锚点本身实际上确实匹配了最后一个 eol 终止符之前的位置,但是整个正则表达式不匹配,因为作为一个整体,它需要匹配整个被匹配的文本,而没有任何匹配终止符。(\Z 匹配终止符之前的位置,这不是一回事。)
If you did "StackOverflow\n".matches("StackOverflow\\Z.*")
you should be ok.
如果你这样做了,"StackOverflow\n".matches("StackOverflow\\Z.*")
你应该没问题。
回答by Alan Moore
I think the main problem here is the unexpected behavior of matches()
: any match must consume the whole input string. Both of your examples fail because the regexes don't consume the linefeed at the end of the string. The anchors have nothing to do with it.
我认为这里的主要问题是以下意外行为matches()
:任何匹配项都必须消耗整个输入字符串。您的两个示例都失败了,因为正则表达式不消耗字符串末尾的换行符。锚与它无关。
In most languages, a regex match can occur anywhere, consuming all, some, or none of the input string. And Java has a method, Matcher#find()
, that performs this traditional kind of match. However, the results are the opposite of what you said you expected:
在大多数语言中,正则表达式匹配可以出现在任何地方,使用所有、部分或不使用输入字符串。Java 有一个方法,Matcher#find()
,可以执行这种传统的匹配。但是,结果与您所说的预期相反:
Pattern.compile("StackOverflow\z").matcher("StackOverflow\n").find() //false
Pattern.compile("StackOverflow\Z").matcher("StackOverflow\n").find() //true
In the first example, the \z
needs to match the end of the string, but the trailing linefeed is in the way. In the second, the \Z
matches before the linefeed, which is at the end of the string.
在第一个示例中,\z
需要匹配字符串的结尾,但尾随换行是障碍。在第二个中,\Z
换行符之前的匹配,它位于字符串的末尾。
回答by code4j
回答by hey_you
I think Alan Moore provided the best answer, especially the crucial point that matches
silently inserts ^
and $
into its regex argument.
我认为艾伦·摩尔提供了最好的答案,尤其是在其正则表达式论证中matches
默默插入^
和的关键点$
。
I'd also like to add a few examples. And a little more explanation.
我还想补充几个例子。还有一点解释。
\z
matches only at the very end of the string.
\z
仅匹配字符串的末尾。
\Z
also matches at the very end of the string, but if there's a \n
, it will match before it.
\Z
也在字符串的最后匹配,但如果有\n
,它将在它之前匹配。
Consider this program:
考虑这个程序:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile(".+\Z"); // some word before the end of the string
String text = "one\ntwo\nthree\nfour\n";
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
}
}
It will find 1 match, and print "four"
.
它将找到 1 个匹配项,并打印"four"
.
Change \Z
to \z
, and it will not match anything, because it doesn't want to match before the \n
.
更改\Z
为\z
,它不会匹配任何内容,因为它不想在\n
.
However, this will also print four
, because there's no \n
at the end:
但是,这也会打印four
,因为\n
最后没有:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile(".+\z");
String text = "one\ntwo\nthree\nfour";
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
}
}