.net 所有可打印字符的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1247762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 13:07:42  来源:igfitidea点击:

Regex for all PRINTABLE characters

.netregex

提问by Alan Moore

Is there a special regex statement like \w that denotes all printable characters? I'd like to validate that a string only contains a character that can be printed--i.e. does not contain ASCII control characters like \b (bell), or null, etc. Anything on the keyboard is fine, and so are UTF chars.

是否有像 \w 这样的特殊正则表达式表示所有可打印字符?我想验证一个字符串只包含一个可以打印的字符——即不包含 ASCII 控制字符,如 \b (bell) 或 null 等。键盘上的任何东西都很好,UTF 字符也是如此.

If there isn't a special statement, how can I specify this in a regex?

如果没有特殊声明,我如何在正则表达式中指定它?

采纳答案by zombat

There is a POSIX character class designation [:print:]that should match printable characters, and [:cntrl:]for control characters. Note that these match codes throughout the ASCII table, so they might not be suitable for matching other encodings.

有一个 POSIX 字符类指定[:print:]应该匹配可打印字符和[:cntrl:]控制字符。请注意,这些匹配代码贯穿整个 ASCII 表,因此它们可能不适合匹配其他编码。

Failing that, the expression [\x00-\x1f]will match through the ASCII control characters, although again, these could be printable in other encodings.

否则,表达式[\x00-\x1f]将通过 ASCII 控制字符进行匹配,尽管这些字符也可以用其他编码打印。

回答by Arman H

Very late to the party, but this regexp works: /[ -~]/.

聚会很晚,但这个正则表达式有效:/[ -~]/.

How? It matches all characters in the range from space(ASCII DEC32) to tilde(ASCII DEC126), which is the range of all printable characters.

如何?它匹配从空格(ASCII DEC32) 到波浪号(ASCII DEC126) 范围内的所有字符,这是所有可打印字符的范围。

If you want to strip non-ASCII characters, you could use something like:

如果你想去除非 ASCII 字符,你可以使用类似的东西:

$someString.replace(/[^ -~]/g, '');

NOTE: this is not valid .netcode, but an example of regexp usage for those who stumble upon this via search engines later.

注意:这不是有效.net代码,而是稍后通过搜索引擎偶然发现的那些使用正则表达式的示例。

回答by Alan Moore

If your regex flavor supports Unicode properties, this is probably the best the best way:

如果您的 regex 风格支持Unicode properties,这可能是最好的最好的方法:

\P{Cc}

That matches any character that's not a control character, whether it be ASCII -- [\x00-\x1F\x7F]-- or Latin1 -- [\x80-\x9F](also known as the C1 control characters).

它匹配任何不是控制字符的字符,无论是 ASCII -- [\x00-\x1F\x7F]-- 还是 Latin1 -- [\x80-\x9F](也称为 C1 控制字符)。

The problem with POSIX classes like [:print:]or \p{Print}is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform. In Java, they're strictly ASCII-oriented. That means \p{Print}matches only the ASCII printing characters -- [\x20-\x7E]-- while \P{Cntrl}(note the capital 'P') matches everything that's notan ASCII control character -- [^\x00-\x1F\x7F]. That is, it matches any ASCII character that isn't a control character, orany non-ASCII character--including C1 control characters.

POSIX 类 like [:print:]or的问题\p{Print}在于它们可以根据正则表达式的风格以及可能的底层平台的语言环境设置来匹配不同的东西。在 Java 中,它们是严格面向 ASCII 的。这意味着\p{Print}仅匹配 ASCII 打印字符 -- [\x20-\x7E]-- 而\P{Cntrl}(注意大写的“P”)匹配所有不是ASCII 控制字符的内容 -- [^\x00-\x1F\x7F]。也就是说,它匹配任何不是控制字符的 ASCII 字符,任何非 ASCII 字符——包括 C1 控制字符。

回答by Norman Ramsey

It depends wildly on what regex package you are using. This is one of these situations about which some wag said that the great thing about standards is there are so many to choose from.

这很大程度上取决于您使用的正则表达式包。这是其中一种情况,有些人摇摆不定地说,标准的伟大之处在于有很多可供选择。

If you happen to be using C, the isprint(3)function/macro is your friend.

如果您碰巧使用 C,那么isprint(3)函数/宏就是您的朋友。

回答by hashable

In Java, the \p{Print}option specifies the printable character class.

在 Java 中,该 \p{Print}选项指定可打印字符类

回答by Adarsha

Adding on to @Alan-Moore, \P{Cc}is actually as example of Negative Unicode Category or Unicode Block(ref: Character Classes in Regular Expressions). \P{name}matches any character that does not belongto a Unicode general category or named block. See the referred link for more examples of named blocks supported in .Net

添加到@Alan-Moore,\P{Cc}实际上是Negative Unicode Category or Unicode Block(参考:正则表达式中的字符类)的示例。\P{name}匹配任何不属于Unicode 通用类别或命名块的字符。有关 .Net 中支持的命名块的更多示例,请参阅参考链接