pandas UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 3 中的字节 0xcc：无效的连续字节

Question

提问by Josephine M. Ho

I'm trying to load a csv file using pd.read_csvbut I get the following unicode error:

我正在尝试使用加载 csv 文件，pd.read_csv但出现以下 unicode 错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 3: invalid continuation byte

Answer 1

回答by bobince

Unfortunately, CSV files have no built-in method of signalling character encoding.

不幸的是，CSV 文件没有内置的字符编码信号方法。

read_csvdefaults to guessing that the bytes in the CSV file represent text encoded in the UTF-8 encoding. This results in UnicodeDecodeErrorif the file is using some other encoding that results in bytes that don't happen to be a valid UTF-8 sequence. (If they by luck did also happen to be valid UTF-8, you wouldn't get the error, but you'd still get wrong input for non-ASCII characters, which would be worse really.)

read_csv默认猜测 CSV 文件中的字节代表以 UTF-8 编码的文本。这会导致UnicodeDecodeError文件是否使用其他编码导致字节碰巧不是有效的 UTF-8 序列。（如果他们碰巧碰巧也是有效的 UTF-8，你就不会得到错误，但你仍然会得到非 ASCII 字符的错误输入，这真的会更糟。）

It's up to you to specify what encoding is in play, which requires some knowledge (or guessing) of where it came from. For example if it came from MS Excel on a western install of Windows, it would probably be Windows code page 1252 and you could read it with:

由您来指定正在使用的编码，这需要对编码的来源有一定的了解（或猜测）。例如，如果它来自西方安装的 Windows 上的 MS Excel，它可能是 Windows 代码页 1252，您可以使用以下命令阅读它：

pd.read_csv('../filename.csv', encoding='cp1252')

Answer 2

回答by rahul ranjan

I got the following error

我收到以下错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 51: invalid continuation byte

UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 51 中的字节 0xe9: 无效的连续字节

This was because I made changes to the file and its encoding. You could also try to change the encoding of file to utf-8 using some code or nqq editor in ubuntu as it provides directory option to change encoding. If problem remains then try to undo all the changes made to the file or change the directory.

这是因为我对文件及其编码进行了更改。您也可以尝试使用一些代码或 ubuntu 中的 nqq 编辑器将文件的编码更改为 utf-8，因为它提供了更改编码的目录选项。如果问题仍然存在，请尝试撤消对文件所做的所有更改或更改目录。

Hope this helps

希望这可以帮助

Answer 3

回答by Happy Happy

Copy the code, open a new .py file and enter code and save.

复制代码，打开一个新的 .py 文件并输入代码并保存。

pandas UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 3 中的字节 0xcc：无效的连续字节

提问by Josephine M. Ho

回答by bobince

回答by rahul ranjan

回答by Happy Happy

相关推荐

最近更新

标签

pandas UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 3 中的字节 0xcc：无效的连续字节

提问by Josephine M. Ho

回答by bobince

回答by rahul ranjan

回答by Happy Happy

相关推荐

pandas Python：数据参数不能是迭代器

按列表排序索引 - Python Pandas

pandas 为什么会出现错误 - 无法连接非 NDFrame 对象

pandas：无法使用 Timestamp 的这些索引器 [2016-08-01 00:00:00] 对 DatetimeIndex 进行位置索引

相关推荐

最近更新

标签