Python 检查字符串是否仅包含 ASCII 字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35889505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:07:01  来源:igfitidea点击:

Check that a string contains only ASCII characters?

pythonpython-2.7

提问by JavaSa

How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?

如何在 Python 中检查字符串是否仅包含 ASCII 字符?像Ruby的东西ascii_only?

I want to be able to tell whether string specific data read from file is in ascii

我希望能够判断从文件中读取的字符串特定数据是否为 ​​ascii

回答by warvariuc

In Python 3.7 were added methods which do what you want:

在 Python 3.7 中添加了可以执行您想要的操作的方法:

str, bytes, and bytearraygained support for the new isascii()method, which can be used to test if a string or bytes contain only the ASCII characters.

str, bytes, 并bytearray获得了对新isascii()方法的支持,该方法可用于测试字符串或字节是否仅包含 ASCII 字符。



Otherwise:

除此以外:

>>> all(ord(char) < 128 for char in 'string')
>>> True

>>> all(ord(char) < 128 for char in 'строка')
>>> False

Another version:

另一个版本:

>>> def is_ascii(text):
    if isinstance(text, unicode):
        try:
            text.encode('ascii')
        except UnicodeEncodeError:
            return False
    else:
        try:
            text.decode('ascii')
        except UnicodeDecodeError:
            return False
    return True
...

>>> is_ascii('text')
>>> True

>>> is_ascii(u'text')
>>> True

>>> is_ascii(u'text-строка')
>>> False

>>> is_ascii('text-строка')
>>> False

>>> is_ascii(u'text-строка'.encode('utf-8'))
>>> False

回答by Quinn

You can also opt for regex to check for only ascii characters. [\x00-\x7F]can match a single ascii character:

您还可以选择正则表达式来仅检查 ascii 字符。[\x00-\x7F]可以匹配单个 ascii 字符:

>>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
>>> OnlyAscii('string')
True
>>> OnlyAscii('Tannh‰user')
False

回答by rotten

If you have unicode strings you can use the "encode" function and then catch the exception:

如果您有 unicode 字符串,您可以使用“编码”函数,然后捕获异常:

try:
    mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
    print("there are non-ascii characters in there")

If you have bytes, you can import the chardet module and check the encoding:

如果你有字节,你可以导入 chardet 模块并检查编码:

import chardet

# Get the encoding
enc = chardet.detect(mystring)['encoding']

回答by Girish Jadhav

A workaround to your problem would be to try and encode the string in a particular encoding.

For example:

解决您的问题的方法是尝试以特定编码对字符串进行编码。

例如:

'Hll?'.encode('utf-8')

This will throw the following error:

这将引发以下错误:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.

现在您可以捕获“UnicodeDecodeError”以确定字符串不只包含 ASCII 字符。

try:
    'Hll?'.encode('utf-8')
except UnicodeDecodeError:
    print 'This string contains more than just the ASCII characters.'