java 任何符号的Java正则表达式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4345673/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 05:59:40  来源:igfitidea点击:

Java regex for any symbol?

javaregexunicodecharacter-properties

提问by Skizit

Is there a regex which accepts any symbol?

是否有接受任何符号的正则表达式?

EDIT: To clarify what I'm looking for.. I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . " ' $ £ etc.) or (not exclusive or) at least 1 character.

编辑:为了澄清我在寻找什么..我想构建一个正则表达式,它可以接受任意数量的空格,并且它必须包含至少 1 个符号(例如 ." ' $ £ 等)或(非排他性或)至少 1 个字符。

回答by aioobe

Yes. The dot (.) will match any symbol, at least if you use it in conjunction with Pattern.DOTALLflag (otherwise it won't match new-line characters). From the docs:

是的。点 ( .) 将匹配任何符号,至少如果您将它与Pattern.DOTALL标志结合使用(否则它将不匹配换行符)。从文档:

In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

在 dotall 模式下,表达式 . 匹配任何字符,包括行终止符。默认情况下,此表达式不匹配行终止符。



Regarding your edit:

关于您的编辑:

I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . " ' $ £ etc.) or (not exclusive or) at least 1 character.

我想构建一个正则表达式,它可以接受任意数量的空格,并且它必须包含至少 1 个符号(例如 ." ' $ £ 等)或(非排他性或)至少 1 个字符。

Here is a suggestion:

这是一个建议:

\s*\S+
  • \s*any number of whitespace characters
  • \S+one or more ("at least one") non-whitespace character.
  • \s*任意数量的空白字符
  • \S+一个或多个(“至少一个”)非空白字符。

回答by tchrist

In Java, a symbol is \pS, which is not the same as punctuation characters, which are \pP.

在 Java 中,符号是\pS,它与标点符号不同,标点符号是\pP

I talk about this issue, plus enumerate the types for all the ASCII punctuation andsymbols, here in this answer.

我谈论这个问题,在这个答案中列举了所有 ASCII 标点符号符号的类型。

Patterns like [\p{Alnum}\s]only work on legacy dataset from the 1960s. To work on things with the Java native characters set, you needs something on the order of

模式[\p{Alnum}\s]仅适用于 1960 年代的遗留数据集。要使用 Java 本机字符集处理事物,您需要一些

identifier_charclass = "[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]";
whitespace_charclass = "[\u000A\u000B\u000C\u000D\u0020\u0085\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u2029\u202F\u205F\u3000]";

ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]";

I'm sorry that Java makes it so difficult to work with modern dataset, but at least it is possible.

我很抱歉 Java 使处理现代数据集变得如此困难,但至少它是可能的。

Just don't ask about boundaries or grapheme clusters. For that, see my others posting.

只是不要询问边界或字素簇。为此,请参阅我其他人的帖子