Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0xa5：起始字节无效

Question

提问by Dipak Ingole

I am using Python-2.6 CGIscripts but found this error in server log while doing json.dumps(),

我正在使用Python-2.6 CGI脚本，但在执行时在服务器日志中发现此错误json.dumps()，

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(??__get?data())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

?Here ,

？这里，

?__get?data()function returns dictionary {}.

?__get?data()函数返回dictionary {}。

Before posting this question I have referred thisof question os SO.

在发布这个问题之前，我已经提到了这个问题 os SO。

UPDATES

更新

Following line is hurting JSON encoder,

以下行正在损害 JSON 编码器，

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

我得到了一个临时修复

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

But I am not sure is it correct way to do it.

但我不确定这样做是否正确。

Answer 1

采纳答案by Santosh Ghimire

The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode()function as follows (if ais the string with non-ascii character):

错误是因为字典中有一些非 ascii 字符并且无法对其进行编码/解码。避免此错误的一种简单方法是使用encode()如下函数对此类字符串进行编码（如果a是具有非 ascii 字符的字符串）：

a.encode('utf-8').strip()

Answer 2

回答by Dipak Ingole

Following line is hurting JSON encoder,

以下行正在损害 JSON 编码器，

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

我得到了一个临时修复

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

Marking this as correct as a temporary fix (Not sure so).

将此标记为正确的临时修复（不确定）。

Answer 3

回答by HimalayanCoder

Set default encoder at the top of your code

在代码顶部设置默认编码器

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

Answer 4

回答by JCF

Your string has a non asciicharacter encoded in it.

您的字符串中有一个非ascii字符编码。

Not being able to decode with utf-8may happen if you've needed to use other encodings in your code. For example:

utf-8如果您需要在代码中使用其他编码，则可能会出现无法解码的情况。例如：

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

In this case, the encoding is windows-1252so you have to do:

在这种情况下，编码是windows-1252这样你必须做的：

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

Now that you have Unicode, you can safely encode into utf-8.

既然有了Unicode，就可以安全地编码为utf-8.

Answer 5

回答by Sushmita

After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you're using scikit learn, it is best to import the dataset as a CSV file.

在尝试了上述所有解决方法后，如果仍然抛出相同的错误，您可以尝试将文件导出为 CSV（如果您已经有第二次）。特别是如果您使用 scikit learn，最好将数据集导入为 CSV 文件。

I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.

我花了几个小时在一起，而解决方案就是这么简单。将文件作为 CSV 导出到 Anaconda 或您的分类器工具的安装目录并尝试。

Answer 6

回答by Coral

Try the below code snippet:

试试下面的代码片段：

with open(path, 'rb') as f:
  text = f.read()

Answer 7

回答by aaronpenne

As of 2018-05 this is handled directly with decode, at least for Python 3.

从 2018-05 开始，这是直接使用处理的decode，至少对于 Python 3 是这样。

I'm using the below snippet for invalid start byteand invalid continuation bytetype errors. Adding errors='ignore'fixed it for me.

我正在使用以下代码片段invalid start byte并invalid continuation byte输入错误。添加errors='ignore'为我修复了它。

with open(out_file, 'rb') as f:
    for line in f:
        print(line.decode(errors='ignore'))

Answer 8

回答by MSalty

I switched this simply by defining a different codec package in the read_csv()command:

我只是通过在read_csv()命令中定义不同的编解码器包来切换它：

encoding = 'unicode_escape'

Eg:

例如：

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

Answer 9

回答by Punnerud

Inspired by @aaronpenne and @Soumyaansh

灵感来自@aaronpenne 和@Soumyaansh

f = open("file.txt", "rb")
text = f.read().decode(errors='replace')

Answer 10

回答by Zuo

If the above methods are not working for you, you may want to look into changing the encoding of the csv file itself.

如果上述方法对您不起作用，您可能需要考虑更改 csv 文件本身的编码。

Using Excel:

使用 Excel：

Open csv file using Excel
Navigate to "File menu" option and click "Save As"
Click "Browse" to select a location to save the file
Enter intended filename
Select CSV (Comma delimited) (*.csv) option
Click "Tools" drop-down box and click "Web Options"
Under "Encoding" tab, select the option Unicode (UTF-8) from "Save this document as" drop-down list
Save the file

使用 Excel 打开 csv 文件
导航到“文件菜单”选项，然后单击“另存为”
单击“浏览”以选择保存文件的位置
输入想要的文件名
选择 CSV（逗号分隔）(*.csv) 选项
单击“工具”下拉框，然后单击“Web 选项”
在“编码”选项卡下，从“将此文档另存为”下拉列表中选择选项 Unicode (UTF-8)
保存文件

Using Notepad:

使用记事本：

Open csv file using notepad
Navigate to "File" > "Save As" option
Next, select the location to the file
Select the Save as type option as All Files(.)
Specify the file name with .csv extension
From "Encoding" drop-down list, select UTF-8 option.
Click Save to save the file

使用记事本打开csv文件
导航到“文件”>“另存为”选项
接下来，选择文件的位置
选择保存类型选项为所有文件( .)
使用 .csv 扩展名指定文件名
从“编码”下拉列表中，选择 UTF-8 选项。
单击保存以保存文件

By doing this, you should be able to import csv files without encountering the UnicodeCodeError.

通过这样做，您应该能够导入 csv 文件而不会遇到 UnicodeCodeError。

Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0xa5：起始字节无效

提问by Dipak Ingole

UPDATES

更新

采纳答案by Santosh Ghimire

回答by Dipak Ingole

回答by HimalayanCoder

回答by JCF

回答by Sushmita

回答by Coral

回答by aaronpenne

回答by MSalty

回答by Punnerud

回答by Zuo

相关推荐

最近更新

标签

Python UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 中的字节 0xa5：起始字节无效

提问by Dipak Ingole

UPDATES

更新

采纳答案by Santosh Ghimire

回答by Dipak Ingole

回答by HimalayanCoder

回答by JCF

回答by Sushmita

回答by Coral

回答by aaronpenne

回答by MSalty

回答by Punnerud

回答by Zuo

相关推荐

为什么在使用 json.dumps 时，python dict 的 int 键会变成字符串？

python中的“语句结束”字符串语法错误

Python 查找和索引的区别

Python 中的 multiprocessing.dummy 未使用 100% cpu

相关推荐

最近更新

标签