java 正则表达式“\\p{Z}”是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30195587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 16:38:33  来源:igfitidea点击:

What does regex "\\p{Z}" mean?

javaregexreplaceall

提问by BRabbit27

I am working with some code in java that has an statement like

我正在使用 java 中的一些代码,其中包含类似的语句

String tempAttribute = ((String) attributes.get(i)).replaceAll("\p{Z}","")

I am not used to regex, so what is the meaning of it? (If you could provide a website to learn the basics of regex that would be wonderful) I've seen that for a string like

我不习惯正则表达式,那么它的含义是什么?(如果你能提供一个网站来学习正则表达式的基础知识,那就太好了)我见过这样的字符串

ept as yit gets transformed into eptasy, but this doesn't seem right. I believe the guy who wrote this wanted to trim leading and trailing spaces maybe.

ept as y它变成了eptasy,但这似乎不对。我相信写这篇文章的人可能想修剪前导和尾随空格。

回答by Alex Shesterov

It removes all the whitespace (replaces all whitespace matches with empty strings).

它删除所有空格(用空字符串替换所有空格匹配)。

A wonderful regex tutorial is available at regular-expressions.info. A citation from this site:

一个精彩的正则表达式教程可在regular-expressions.info 获得来自本网站的引文:

\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.

\p{Z} 或 \p{Separator}:任何类型的空格或不可见分隔符。

回答by sbecker11

The OP stated that the code fragment was in Java. To comment on the statement:

OP 表示代码片段是用 Java 编写的。评论该声明:

\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.

\p{Z} 或 \p{Separator}:任何类型的空格或不可见分隔符。

the sample code below shows that this does not apply in Java.

下面的示例代码表明这不适用于 Java。

public static void main(String[] args) {

    // some normal white space characters
    String str = "word1 \t \n \f \r " + '\u000B' + " word2"; 

    // various regex patterns meant to remove ALL white spaces
    String s = str.replaceAll("\s", "");
    String p = str.replaceAll("\p{Space}", "");
    String b = str.replaceAll("\p{Blank}", "");
    String z = str.replaceAll("\p{Z}", "");

    // \s removed all white spaces
    System.out.println("s [" + s + "]\n"); 

    // \p{Space} removed all white spaces
    System.out.println("p [" + p + "]\n"); 

    // \p{Blank} removed only \t and spaces not \n\f\r
    System.out.println("b [" + b + "]\n"); 

    // \p{Z} removed only spaces not \t\n\f\r
    System.out.println("z [" + z + "]\n"); 

    // NOTE: \p{Separator} throws a PatternSyntaxException
    try {
        String t = str.replaceAll("\p{Separator}","");
        System.out.println("t [" + t + "]\n"); // N/A
    } catch ( Exception e ) {
        System.out.println("throws " + e.getClass().getName() + 
                " with message\n" + e.getMessage());
    }

} // public static void main

The output for this is:

这个的输出是:

s [word1word2]

p [word1word2]

b [word1


word2]

z [word1    


word2]

throws java.util.regex.PatternSyntaxException with message
Unknown character property name {Separator} near index 12
\p{Separator}
            ^

This shows that in Java \\p{Z} removes only spaces and not "any kind of whitespace or invisible separator".

这表明在 Java 中 \\p{Z} 仅删除空格而不是“任何类型的空格或不可见分隔符”。

These results also show that in Java \\p{Separator} throws a PatternSyntaxException.

这些结果还表明,在 Java 中 \\p{Separator} 会抛出 PatternSyntaxException。

回答by Tung Linh

First of all, \pmeans you are going to match a class, a collection of character, not single one. For reference, this is Javadoc of Pattern class. https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

首先, \p意味着您将匹配一个类,一个字符集合,而不是单个字符。作为参考,这是 Pattern 类的 Javadoc。https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.

Unicode 脚本、块、类别和二进制属性是用 \p 和 \P 结构编写的,就像在 Perl 中一样。\p{prop} 匹配输入是否具有属性 prop,而 \P{prop} 不匹配如果输入具有该属性。

And then Zis the name of a class (collection,set) of characters. In this case, it's abbreviation of Separator. Separatorcontaints 3 sub classes: Space_Separator(Zs), Line_Separator(Zl) and Paragraph_Separator(Zp).

然后Z是字符类(集合、集合)的名称。在这种情况下,它是 的缩写SeparatorSeparator包含 3 个子类:Space_Separator(Zs)、Line_Separator(Zl) 和Paragraph_Separator(Zp)。

Refer here for which characters those classes contains here: Unicode Character Databaseor Unicode Character Categories

请参阅此处了解这些类包含哪些字符:Unicode 字符数据库Unicode 字符类别

More document: http://www.unicode.org/reports/tr18/#General_Category_Property

更多文件:http: //www.unicode.org/reports/tr18/#General_Category_Property