java 什么是控制字符的正则表达式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4893759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is a regular expression for control characters?
提问by Cameron Tinker
I'm trying to match a control character in the form \^c where c is any valid character for control characters. I have this regular expression, but it's not currently working: \\[^][@-z]
我正在尝试以 \^c 形式匹配控制字符,其中 c 是控制字符的任何有效字符。我有这个正则表达式,但它目前不起作用:\\[^][@-z]
I think the problem lies with the fact that the caret character (^) is part of the regular expressions parsing engine.
我认为问题在于插入字符 (^) 是正则表达式解析引擎的一部分。
回答by tchrist
Match an ASCII text string of the form ^X
using the pattern \^.
, nothing more. Match an ASCII text string of the form \^X
with the pattern \\\^.
. You may wish to constrain that dot to [?@_\[\]^\\]
, so \\\^[A-Z?@_\[\]^\\]
. It's easier to read as [?\x40-\x5F]
for the bracketed character class, hence \\\^[?\x40-\x5F]
for a literal BACKSLASH, followed by a literal CIRCUMFLEX, followed by something that turns into one of the valid control characters.
^X
使用 pattern匹配表单的 ASCII 文本字符串\^.
,仅此而已。将表单的 ASCII 文本字符串\^X
与模式匹配\\\^.
。您可能希望将该点限制为[?@_\[\]^\\]
,因此\\\^[A-Z?@_\[\]^\\]
。[?\x40-\x5F]
对于带括号的字符类,它更容易阅读,因此\\\^[?\x40-\x5F]
对于文字 BACKSLASH,后跟文字 CIRCUMFLEX,然后是变成有效控制字符之一的某些内容。
Note that that is the result of printing out the pattern, or what you'd read from a file. It's what you need to pass to the regex compiler. If you have it as a string literal, you must of course double each of those backslashes. `\\\\\\^[?\\x40-\\x5F]"
Yes, it is insane looking, but that is because Java does not support regexes directly as Groovy and Scala — or Perl and Ruby — do. Regex work is always easier without the extra bbaacckksslllllaasshheesssssess. :)
请注意,这是打印出模式的结果,或者您从文件中读取的内容。这是您需要传递给正则表达式编译器的内容。如果您将它作为字符串文字,您当然必须将每个反斜杠加倍。`\\\\\\^[?\\x40-\\x5F]"
是的,它看起来很疯狂,但那是因为 Java 并不像 Groovy 和 Scala(或 Perl 和 Ruby)那样直接支持正则表达式。没有额外的 bbaackksslllllaashheesssssess,正则表达式的工作总是更容易。:)
If you had real control characters instead of indirect representations of them, you would use \pC
for all literal code points with the property GC=Other, or \p{Cc}
for just GC=Control.
如果您有真正的控制字符而不是它们的间接表示,您将使用\pC
属性为 GC=Other 的所有文字代码点,或\p{Cc}
仅使用 GC=Control。
回答by gbvb
Check this out: http://www.regular-expressions.info/characters.html. You should be able to use \cA to \cZ to find the control characters..
看看这个:http: //www.regular-expressions.info/characters.html。您应该能够使用 \cA 到 \cZ 来查找控制字符。