.net 所有可打印字符的正则表达式

Question

提问by Alan Moore

Is there a special regex statement like \w that denotes all printable characters? I'd like to validate that a string only contains a character that can be printed--i.e. does not contain ASCII control characters like \b (bell), or null, etc. Anything on the keyboard is fine, and so are UTF chars.

是否有像 \w 这样的特殊正则表达式表示所有可打印字符？我想验证一个字符串只包含一个可以打印的字符——即不包含 ASCII 控制字符，如 \b (bell) 或 null 等。键盘上的任何东西都很好，UTF 字符也是如此.

If there isn't a special statement, how can I specify this in a regex?

如果没有特殊声明，我如何在正则表达式中指定它？

Answer 1

采纳答案by zombat

There is a POSIX character class designation [:print:]that should match printable characters, and [:cntrl:]for control characters. Note that these match codes throughout the ASCII table, so they might not be suitable for matching other encodings.

有一个 POSIX 字符类指定[:print:]应该匹配可打印字符和[:cntrl:]控制字符。请注意，这些匹配代码贯穿整个 ASCII 表，因此它们可能不适合匹配其他编码。

Failing that, the expression [\x00-\x1f]will match through the ASCII control characters, although again, these could be printable in other encodings.

否则，表达式[\x00-\x1f]将通过 ASCII 控制字符进行匹配，尽管这些字符也可以用其他编码打印。

Answer 2

回答by Arman H

Very late to the party, but this regexp works: /[ -~]/.

聚会很晚，但这个正则表达式有效：/[ -~]/.

How? It matches all characters in the range from space(ASCII DEC32) to tilde(ASCII DEC126), which is the range of all printable characters.

如何？它匹配从空格(ASCII DEC32) 到波浪号(ASCII DEC126) 范围内的所有字符，这是所有可打印字符的范围。

If you want to strip non-ASCII characters, you could use something like:

如果你想去除非 ASCII 字符，你可以使用类似的东西：

$someString.replace(/[^ -~]/g, '');

NOTE: this is not valid .netcode, but an example of regexp usage for those who stumble upon this via search engines later.

注意：这不是有效.net代码，而是稍后通过搜索引擎偶然发现的那些使用正则表达式的示例。

Answer 3

回答by Alan Moore

If your regex flavor supports Unicode properties, this is probably the best the best way:

如果您的 regex 风格支持Unicode properties，这可能是最好的最好的方法：

\P{Cc}

That matches any character that's not a control character, whether it be ASCII -- [\x00-\x1F\x7F]-- or Latin1 -- [\x80-\x9F](also known as the C1 control characters).

它匹配任何不是控制字符的字符，无论是 ASCII -- [\x00-\x1F\x7F]-- 还是 Latin1 -- [\x80-\x9F]（也称为 C1 控制字符）。

The problem with POSIX classes like [:print:]or \p{Print}is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform. In Java, they're strictly ASCII-oriented. That means \p{Print}matches only the ASCII printing characters -- [\x20-\x7E]-- while \P{Cntrl}(note the capital 'P') matches everything that's notan ASCII control character -- [^\x00-\x1F\x7F]. That is, it matches any ASCII character that isn't a control character, orany non-ASCII character--including C1 control characters.

POSIX 类 like [:print:]or的问题\p{Print}在于它们可以根据正则表达式的风格以及可能的底层平台的语言环境设置来匹配不同的东西。在 Java 中，它们是严格面向 ASCII 的。这意味着\p{Print}仅匹配 ASCII 打印字符 -- [\x20-\x7E]-- 而\P{Cntrl}（注意大写的“P”）匹配所有不是ASCII 控制字符的内容 -- [^\x00-\x1F\x7F]。也就是说，它匹配任何不是控制字符的 ASCII 字符，或任何非 ASCII 字符——包括 C1 控制字符。

Answer 4

回答by Norman Ramsey

It depends wildly on what regex package you are using. This is one of these situations about which some wag said that the great thing about standards is there are so many to choose from.

这很大程度上取决于您使用的正则表达式包。这是其中一种情况，有些人摇摆不定地说，标准的伟大之处在于有很多可供选择。

If you happen to be using C, the isprint(3)function/macro is your friend.

如果您碰巧使用 C，那么isprint(3)函数/宏就是您的朋友。

Answer 5

回答by hashable

In Java, the \p{Print}option specifies the printable character class.

在 Java 中，该 \p{Print}选项指定可打印字符类。

Answer 6

回答by Adarsha

Adding on to @Alan-Moore, \P{Cc}is actually as example of Negative Unicode Category or Unicode Block(ref: Character Classes in Regular Expressions). \P{name}matches any character that does not belongto a Unicode general category or named block. See the referred link for more examples of named blocks supported in .Net

添加到@Alan-Moore，\P{Cc}实际上是Negative Unicode Category or Unicode Block（参考：正则表达式中的字符类）的示例。\P{name}匹配任何不属于Unicode 通用类别或命名块的字符。有关 .Net 中支持的命名块的更多示例，请参阅参考链接

.net 所有可打印字符的正则表达式

提问by Alan Moore

采纳答案by zombat

回答by Arman H

回答by Alan Moore

回答by Norman Ramsey

回答by hashable

回答by Adarsha

相关推荐

最近更新

标签

.net 所有可打印字符的正则表达式

提问by Alan Moore

采纳答案by zombat

回答by Arman H

回答by Alan Moore

回答by Norman Ramsey

回答by hashable

回答by Adarsha

相关推荐

.net RESTful API 与 Web 服务 API

.net 实体框架中的计算属性

调整小数精度，.net

.net OpenRemoteBaseKey() 凭据

相关推荐

最近更新

标签