java 什么是控制字符的正则表达式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4893759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 08:31:18  来源:igfitidea点击:

What is a regular expression for control characters?

javaregexasciilexical-analysis

提问by Cameron Tinker

I'm trying to match a control character in the form \^c where c is any valid character for control characters. I have this regular expression, but it's not currently working: \\[^][@-z]

我正在尝试以 \^c 形式匹配控制字符,其中 c 是控制字符的任何有效字符。我有这个正则表达式,但它目前不起作用:\\[^][@-z]

I think the problem lies with the fact that the caret character (^) is part of the regular expressions parsing engine.

我认为问题在于插入字符 (^) 是正则表达式解析引擎的一部分。

回答by tchrist

Match an ASCII text string of the form ^Xusing the pattern \^., nothing more. Match an ASCII text string of the form \^Xwith the pattern \\\^.. You may wish to constrain that dot to [?@_\[\]^\\], so \\\^[A-Z?@_\[\]^\\]. It's easier to read as [?\x40-\x5F]for the bracketed character class, hence \\\^[?\x40-\x5F]for a literal BACKSLASH, followed by a literal CIRCUMFLEX, followed by something that turns into one of the valid control characters.

^X使用 pattern匹配表单的 ASCII 文本字符串\^.,仅此而已。将表单的 ASCII 文本字符串\^X与模式匹配\\\^.。您可能希望将该点限制为[?@_\[\]^\\],因此\\\^[A-Z?@_\[\]^\\][?\x40-\x5F]对于带括号的字符类,它更容易阅读,因此\\\^[?\x40-\x5F]对于文字 BACKSLASH,后跟文字 CIRCUMFLEX,然后是变成有效控制字符之一的某些内容。

Note that that is the result of printing out the pattern, or what you'd read from a file. It's what you need to pass to the regex compiler. If you have it as a string literal, you must of course double each of those backslashes. `\\\\\\^[?\\x40-\\x5F]"Yes, it is insane looking, but that is because Java does not support regexes directly as Groovy and Scala — or Perl and Ruby — do. Regex work is always easier without the extra bbaacckksslllllaasshheesssssess. :)

请注意,这是打印出模式的结果,或者您从文件中读取的内容。这是您需要传递给正则表达式编译器的内容。如果您将它作为字符串文字,您当然必须将每个反斜杠加倍。`\\\\\\^[?\\x40-\\x5F]"是的,它看起来很疯狂,但那是因为 Java 并不像 Groovy 和 Scala(或 Perl 和 Ruby)那样直接支持正则表达式。没有额外的 bbaackksslllllaashheesssssess,正则表达式的工作总是更容易。:)

If you had real control characters instead of indirect representations of them, you would use \pCfor all literal code points with the property GC=Other, or \p{Cc}for just GC=Control.

如果您有真正的控制字符而不是它们的间接表示,您将使用\pC属性为 GC=Other 的所有文字代码点,或\p{Cc}仅使用 GC=Control。

回答by gbvb

Check this out: http://www.regular-expressions.info/characters.html. You should be able to use \cA to \cZ to find the control characters..

看看这个:http: //www.regular-expressions.info/characters.html。您应该能够使用 \cA 到 \cZ 来查找控制字符。