pandas Panda read_csv 中的编码错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30462807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Encoding Error in Panda read_csv
提问by khtad
I'm attempting to read a CSV file into a Dataframe in Pandas. When I try to do that, I get the following error:
我正在尝试将 CSV 文件读入 Pandas 中的 Dataframe。当我尝试这样做时,我收到以下错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 55: invalid start byte
UnicodeDecodeError: 'utf-8' 编解码器无法解码位置 55 中的字节 0x96:起始字节无效
This is from code:
这是来自代码:
import pandas as pd
location = r"C:\Users\khtad\Documents\test.csv"
df = pd.read_csv(location, header=0, quotechar='"')
This is on a Windows 7 Enterprise Service Pack 1 machine and it seems to apply to every CSV file I create. In this particular case the binary from location 55 is 00101001 and location 54 is 01110011, if that matters.
这是在 Windows 7 Enterprise Service Pack 1 机器上,它似乎适用于我创建的每个 CSV 文件。在这种特殊情况下,来自位置 55 的二进制文件是 00101001,位置 54 是 01110011,如果这很重要的话。
Saving the file as UTF-8 with a text editor doesn't seem to help, either. Similarly, adding the param "encoding='utf-8' doesn't work, either--it returns the same error.
使用文本编辑器将文件保存为 UTF-8 似乎也无济于事。同样,添加参数“encoding='utf-8' 也不起作用——它返回相同的错误。
What is the most likely cause of this error and are there any workarounds other than abandoning the DataFrame construct for the moment and using the csv module to read in the CSV line-by-line?
导致此错误的最可能原因是什么,除了暂时放弃 DataFrame 构造并使用 csv 模块逐行读取 CSV 之外,是否还有其他解决方法?
回答by maxymoo
Try calling read_csv
with encoding='latin1'
, encoding='iso-8859-1'
or encoding='cp1252'
(these are some of the various encodings found on Windows).
尝试read_csv
使用encoding='latin1'
, encoding='iso-8859-1'
or调用encoding='cp1252'
(这些是 Windows 上的一些各种编码)。
回答by sushmit
This works in Mac as well you can use
这也适用于 Mac,您可以使用
df= pd.read_csv('Region_count.csv',encoding ='latin1')
df= pd.read_csv('Region_count.csv',encoding ='latin1')