Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0xa5:起始字节无效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22216076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte
提问by Dipak Ingole
I am using Python-2.6 CGIscripts but found this error in server log while doing json.dumps(),
我正在使用Python-2.6 CGI脚本,但在执行时在服务器日志中发现此错误json.dumps(),
Traceback (most recent call last):
File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
print json.dumps(??__get?data())
File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte
?Here ,
?这里 ,
?__get?data()function returns dictionary {}.
?__get?data()函数返回dictionary {}。
Before posting this question I have referred thisof question os SO.
在发布这个问题之前,我已经提到了这个问题 os SO。
UPDATES
更新
Following line is hurting JSON encoder,
以下行正在损害 JSON 编码器,
now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit
I got a temporary fix for it
我得到了一个临时修复
print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })
But I am not sure is it correct way to do it.
但我不确定这样做是否正确。
采纳答案by Santosh Ghimire
The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode()function as follows (if ais the string with non-ascii character):
错误是因为字典中有一些非 ascii 字符并且无法对其进行编码/解码。避免此错误的一种简单方法是使用encode()如下函数对此类字符串进行编码(如果a是具有非 ascii 字符的字符串):
a.encode('utf-8').strip()
回答by Dipak Ingole
Following line is hurting JSON encoder,
以下行正在损害 JSON 编码器,
now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit
I got a temporary fix for it
我得到了一个临时修复
print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })
Marking this as correct as a temporary fix (Not sure so).
将此标记为正确的临时修复(不确定)。
回答by HimalayanCoder
Set default encoder at the top of your code
在代码顶部设置默认编码器
import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")
回答by JCF
Your string has a non asciicharacter encoded in it.
您的字符串中有一个非ascii字符编码。
Not being able to decode with utf-8may happen if you've needed to use other encodings in your code. For example:
utf-8如果您需要在代码中使用其他编码,则可能会出现无法解码的情况。例如:
>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte
In this case, the encoding is windows-1252so you have to do:
在这种情况下,编码是windows-1252这样你必须做的:
>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'
Now that you have Unicode, you can safely encode into utf-8.
既然有了Unicode,就可以安全地编码为utf-8.
回答by Sushmita
After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you're using scikit learn, it is best to import the dataset as a CSV file.
在尝试了上述所有解决方法后,如果仍然抛出相同的错误,您可以尝试将文件导出为 CSV(如果您已经有第二次)。特别是如果您使用 scikit learn,最好将数据集导入为 CSV 文件。
I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.
我花了几个小时在一起,而解决方案就是这么简单。将文件作为 CSV 导出到 Anaconda 或您的分类器工具的安装目录并尝试。
回答by Coral
Try the below code snippet:
试试下面的代码片段:
with open(path, 'rb') as f:
text = f.read()
回答by aaronpenne
As of 2018-05 this is handled directly with decode, at least for Python 3.
从 2018-05 开始,这是直接使用 处理的decode,至少对于 Python 3 是这样。
I'm using the below snippet for invalid start byteand invalid continuation bytetype errors. Adding errors='ignore'fixed it for me.
我正在使用以下代码片段invalid start byte并invalid continuation byte输入错误。添加errors='ignore'为我修复了它。
with open(out_file, 'rb') as f:
for line in f:
print(line.decode(errors='ignore'))
回答by MSalty
I switched this simply by defining a different codec package in the read_csv()command:
我只是通过在read_csv()命令中定义不同的编解码器包来切换它:
encoding = 'unicode_escape'
encoding = 'unicode_escape'
Eg:
例如:
import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')
回答by Punnerud
Inspired by @aaronpenne and @Soumyaansh
灵感来自@aaronpenne 和@Soumyaansh
f = open("file.txt", "rb")
text = f.read().decode(errors='replace')
回答by Zuo
If the above methods are not working for you, you may want to look into changing the encoding of the csv file itself.
如果上述方法对您不起作用,您可能需要考虑更改 csv 文件本身的编码。
Using Excel:
使用 Excel:
- Open csv file using Excel
- Navigate to "File menu" option and click "Save As"
- Click "Browse" to select a location to save the file
- Enter intended filename
- Select CSV (Comma delimited) (*.csv) option
- Click "Tools" drop-down box and click "Web Options"
- Under "Encoding" tab, select the option Unicode (UTF-8) from "Save this document as" drop-down list
- Save the file
- 使用 Excel 打开 csv 文件
- 导航到“文件菜单”选项,然后单击“另存为”
- 单击“浏览”以选择保存文件的位置
- 输入想要的文件名
- 选择 CSV(逗号分隔)(*.csv) 选项
- 单击“工具”下拉框,然后单击“Web 选项”
- 在“编码”选项卡下,从“将此文档另存为”下拉列表中选择选项 Unicode (UTF-8)
- 保存文件
Using Notepad:
使用记事本:
- Open csv file using notepad
- Navigate to "File" > "Save As" option
- Next, select the location to the file
- Select the Save as type option as All Files(.)
- Specify the file name with .csv extension
- From "Encoding" drop-down list, select UTF-8 option.
- Click Save to save the file
- 使用记事本打开csv文件
- 导航到“文件”>“另存为”选项
- 接下来,选择文件的位置
- 选择保存类型选项为所有文件( .)
- 使用 .csv 扩展名指定文件名
- 从“编码”下拉列表中,选择 UTF-8 选项。
- 单击保存以保存文件
By doing this, you should be able to import csv files without encountering the UnicodeCodeError.
通过这样做,您应该能够导入 csv 文件而不会遇到 UnicodeCodeError。

