Python 检查字符串是否仅包含 ASCII 字符？

Question

提问by JavaSa

How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?

如何在 Python 中检查字符串是否仅包含 ASCII 字符？像Ruby的东西ascii_only?

I want to be able to tell whether string specific data read from file is in ascii

我希望能够判断从文件中读取的字符串特定数据是否为 ascii

Answer 1

回答by warvariuc

In Python 3.7 were added methods which do what you want:

在 Python 3.7 中添加了可以执行您想要的操作的方法：

str, bytes, and bytearraygained support for the new isascii()method, which can be used to test if a string or bytes contain only the ASCII characters.

str, bytes, 并bytearray获得了对新isascii()方法的支持，该方法可用于测试字符串或字节是否仅包含 ASCII 字符。

Otherwise:

除此以外：

>>> all(ord(char) < 128 for char in 'string')
>>> True

>>> all(ord(char) < 128 for char in 'строка')
>>> False

Another version:

另一个版本：

>>> def is_ascii(text):
    if isinstance(text, unicode):
        try:
            text.encode('ascii')
        except UnicodeEncodeError:
            return False
    else:
        try:
            text.decode('ascii')
        except UnicodeDecodeError:
            return False
    return True
...

>>> is_ascii('text')
>>> True

>>> is_ascii(u'text')
>>> True

>>> is_ascii(u'text-строка')
>>> False

>>> is_ascii('text-строка')
>>> False

>>> is_ascii(u'text-строка'.encode('utf-8'))
>>> False

Answer 2

回答by Quinn

You can also opt for regex to check for only ascii characters. [\x00-\x7F]can match a single ascii character:

您还可以选择正则表达式来仅检查 ascii 字符。[\x00-\x7F]可以匹配单个 ascii 字符：

>>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
>>> OnlyAscii('string')
True
>>> OnlyAscii('Tannh‰user')
False

Answer 3

回答by rotten

If you have unicode strings you can use the "encode" function and then catch the exception:

如果您有 unicode 字符串，您可以使用“编码”函数，然后捕获异常：

try:
    mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
    print("there are non-ascii characters in there")

If you have bytes, you can import the chardet module and check the encoding:

如果你有字节，你可以导入 chardet 模块并检查编码：

import chardet

# Get the encoding
enc = chardet.detect(mystring)['encoding']

Answer 4

回答by Girish Jadhav

A workaround to your problem would be to try and encode the string in a particular encoding.

For example:

解决您的问题的方法是尝试以特定编码对字符串进行编码。

例如：

'Hll?'.encode('utf-8')

This will throw the following error:

这将引发以下错误：

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)

Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.

现在您可以捕获“UnicodeDecodeError”以确定字符串不只包含 ASCII 字符。

try:
    'Hll?'.encode('utf-8')
except UnicodeDecodeError:
    print 'This string contains more than just the ASCII characters.'

Python 检查字符串是否仅包含 ASCII 字符？

提问by JavaSa

回答by warvariuc

回答by Quinn

回答by rotten

回答by Girish Jadhav

相关推荐

最近更新

标签

Python 检查字符串是否仅包含 ASCII 字符？

提问by JavaSa

回答by warvariuc

回答by Quinn

回答by rotten

回答by Girish Jadhav

相关推荐

Python 创建特定大小的熊猫数据框

Python 自定义 argparse 帮助消息

将扫描的pdf转换为文本python

Python 导入错误：尝试导入祝福时没有名为“_curses”的模块

相关推荐

最近更新

标签