Python 检查字符串是否仅包含 ASCII 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35889505/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Check that a string contains only ASCII characters?
提问by JavaSa
How do I check that a string only contains ASCII characters in Python? Something like Ruby's ascii_only?
如何在 Python 中检查字符串是否仅包含 ASCII 字符?像Ruby的东西ascii_only?
I want to be able to tell whether string specific data read from file is in ascii
我希望能够判断从文件中读取的字符串特定数据是否为 ascii
回答by warvariuc
In Python 3.7 were added methods which do what you want:
在 Python 3.7 中添加了可以执行您想要的操作的方法:
str
,bytes
, andbytearray
gained support for the newisascii()
method, which can be used to test if a string or bytes contain only the ASCII characters.
str
,bytes
, 并bytearray
获得了对新isascii()
方法的支持,该方法可用于测试字符串或字节是否仅包含 ASCII 字符。
Otherwise:
除此以外:
>>> all(ord(char) < 128 for char in 'string')
>>> True
>>> all(ord(char) < 128 for char in 'строка')
>>> False
Another version:
另一个版本:
>>> def is_ascii(text):
if isinstance(text, unicode):
try:
text.encode('ascii')
except UnicodeEncodeError:
return False
else:
try:
text.decode('ascii')
except UnicodeDecodeError:
return False
return True
...
>>> is_ascii('text')
>>> True
>>> is_ascii(u'text')
>>> True
>>> is_ascii(u'text-строка')
>>> False
>>> is_ascii('text-строка')
>>> False
>>> is_ascii(u'text-строка'.encode('utf-8'))
>>> False
回答by Quinn
You can also opt for regex to check for only ascii characters. [\x00-\x7F]
can match a single ascii character:
您还可以选择正则表达式来仅检查 ascii 字符。[\x00-\x7F]
可以匹配单个 ascii 字符:
>>> OnlyAscii = lambda s: re.match('^[\x00-\x7F]+$', s) != None
>>> OnlyAscii('string')
True
>>> OnlyAscii('Tannh‰user')
False
回答by rotten
If you have unicode strings you can use the "encode" function and then catch the exception:
如果您有 unicode 字符串,您可以使用“编码”函数,然后捕获异常:
try:
mynewstring = mystring.encode('ascii')
except UnicodeEncodeError:
print("there are non-ascii characters in there")
If you have bytes, you can import the chardet module and check the encoding:
如果你有字节,你可以导入 chardet 模块并检查编码:
import chardet
# Get the encoding
enc = chardet.detect(mystring)['encoding']
回答by Girish Jadhav
A workaround to your problem would be to try and encode the string in a particular encoding.
For example:
解决您的问题的方法是尝试以特定编码对字符串进行编码。
例如:
'Hll?'.encode('utf-8')
This will throw the following error:
这将引发以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
Now you can catch the "UnicodeDecodeError" to determine that the string did not contain just the ASCII characters.
现在您可以捕获“UnicodeDecodeError”以确定字符串不只包含 ASCII 字符。
try:
'Hll?'.encode('utf-8')
except UnicodeDecodeError:
print 'This string contains more than just the ASCII characters.'