java 正则表达式“\\p{Z}”是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30195587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does regex "\\p{Z}" mean?
提问by BRabbit27
I am working with some code in java that has an statement like
我正在使用 java 中的一些代码,其中包含类似的语句
String tempAttribute = ((String) attributes.get(i)).replaceAll("\p{Z}","")
I am not used to regex, so what is the meaning of it? (If you could provide a website to learn the basics of regex that would be wonderful) I've seen that for a string like
我不习惯正则表达式,那么它的含义是什么?(如果你能提供一个网站来学习正则表达式的基础知识,那就太好了)我见过这样的字符串
ept as y
it gets transformed into eptasy
, but this doesn't seem right. I believe the guy who wrote this wanted to trim leading and trailing spaces maybe.
ept as y
它变成了eptasy
,但这似乎不对。我相信写这篇文章的人可能想修剪前导和尾随空格。
回答by Alex Shesterov
It removes all the whitespace (replaces all whitespace matches with empty strings).
它删除所有空格(用空字符串替换所有空格匹配)。
A wonderful regex tutorial is available at regular-expressions.info. A citation from this site:
一个精彩的正则表达式教程可在regular-expressions.info 获得。来自本网站的引文:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
\p{Z} 或 \p{Separator}:任何类型的空格或不可见分隔符。
回答by sbecker11
The OP stated that the code fragment was in Java. To comment on the statement:
OP 表示代码片段是用 Java 编写的。评论该声明:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
\p{Z} 或 \p{Separator}:任何类型的空格或不可见分隔符。
the sample code below shows that this does not apply in Java.
下面的示例代码表明这不适用于 Java。
public static void main(String[] args) {
// some normal white space characters
String str = "word1 \t \n \f \r " + '\u000B' + " word2";
// various regex patterns meant to remove ALL white spaces
String s = str.replaceAll("\s", "");
String p = str.replaceAll("\p{Space}", "");
String b = str.replaceAll("\p{Blank}", "");
String z = str.replaceAll("\p{Z}", "");
// \s removed all white spaces
System.out.println("s [" + s + "]\n");
// \p{Space} removed all white spaces
System.out.println("p [" + p + "]\n");
// \p{Blank} removed only \t and spaces not \n\f\r
System.out.println("b [" + b + "]\n");
// \p{Z} removed only spaces not \t\n\f\r
System.out.println("z [" + z + "]\n");
// NOTE: \p{Separator} throws a PatternSyntaxException
try {
String t = str.replaceAll("\p{Separator}","");
System.out.println("t [" + t + "]\n"); // N/A
} catch ( Exception e ) {
System.out.println("throws " + e.getClass().getName() +
" with message\n" + e.getMessage());
}
} // public static void main
The output for this is:
这个的输出是:
s [word1word2]
p [word1word2]
b [word1
word2]
z [word1
word2]
throws java.util.regex.PatternSyntaxException with message
Unknown character property name {Separator} near index 12
\p{Separator}
^
This shows that in Java \\p{Z} removes only spaces and not "any kind of whitespace or invisible separator".
这表明在 Java 中 \\p{Z} 仅删除空格而不是“任何类型的空格或不可见分隔符”。
These results also show that in Java \\p{Separator} throws a PatternSyntaxException.
这些结果还表明,在 Java 中 \\p{Separator} 会抛出 PatternSyntaxException。
回答by Tung Linh
First of all, \p
means you are going to match a class, a collection of character, not single one. For reference, this is Javadoc of Pattern class. https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
首先, \p
意味着您将匹配一个类,一个字符集合,而不是单个字符。作为参考,这是 Pattern 类的 Javadoc。https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.
Unicode 脚本、块、类别和二进制属性是用 \p 和 \P 结构编写的,就像在 Perl 中一样。\p{prop} 匹配输入是否具有属性 prop,而 \P{prop} 不匹配如果输入具有该属性。
And then Z
is the name of a class (collection,set) of characters. In this case, it's abbreviation of Separator
. Separator
containts 3 sub classes: Space_Separator
(Zs), Line_Separator
(Zl) and Paragraph_Separator
(Zp).
然后Z
是字符类(集合、集合)的名称。在这种情况下,它是 的缩写Separator
。Separator
包含 3 个子类:Space_Separator
(Zs)、Line_Separator
(Zl) 和Paragraph_Separator
(Zp)。
Refer here for which characters those classes contains here: Unicode Character Databaseor Unicode Character Categories
请参阅此处了解这些类包含哪些字符:Unicode 字符数据库或 Unicode 字符类别
More document: http://www.unicode.org/reports/tr18/#General_Category_Property
更多文件:http: //www.unicode.org/reports/tr18/#General_Category_Property