将 Pandas 中的 CSV 文件导入到 Pandas 数据框中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26098114/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Importing a CSV file in pandas into a pandas dataframe
提问by user7289
I have a CSV file taken from a SQL dump that looks like the below (first few lines using head file.csv from terminal):
我有一个从 SQL 转储中获取的 CSV 文件,如下所示(使用终端中的 head file.csv 的前几行):
??AANAT,AANAT1576,4
AANAT,AANAT1704,1
AAP,AAP-D-12-00691,8
AAP,AAP-D-12-00834,3
When I use the pd.read_csv('file.csv') command I get an error "ValueError: No columns to parse from file".
当我使用 pd.read_csv('file.csv') 命令时,我收到错误“ValueError: No columns to parse from file”。
Any ideas on how to import the CSV file into a table and avoid the error?
关于如何将 CSV 文件导入表格并避免错误的任何想法?
ELABORATION OF QUESTION (following Ed's comment)
问题的阐述(按照 Ed 的评论)
I have tried header = None, skiprows=1 to avoid the ?? (which appear when using the head command from the terminal).
我试过 header = None, skiprows=1 来避免 ?? (在终端使用 head 命令时出现)。
The file path to the extract is http://goo.gl/jyYlIK
提取的文件路径是http://goo.gl/jyYlIK
回答by EdChum
So the ??characters you see are in fact non-printable characters which after looking at your raw csv file using a hex editor show that they are in fact utf-16 little endian\FFEEwhich is the Byte-Order-Mark.
因此,??您看到的字符实际上是不可打印的字符,在使用十六进制编辑器查看原始 csv 文件后显示它们实际上是utf-16 小端\FFEE,即字节顺序标记。
So all you need to do is to pass this as the encoding type and it reads in fine:
所以你需要做的就是把它作为编码类型传递,它读起来很好:
In [46]:
df = pd.read_csv('otherfile.csv', encoding='utf-16', header=None)
df
Out[46]:
0 1 2
0 AANAT AANAT1576 4
1 AANAT AANAT1704 1
2 AAP AAP-D-12-00691 8
3 AAP AAP-D-12-00834 3
4 AAP AAP-D-13-00215 10
5 AAP AAP-D-13-00270 7
6 AAP AAP-D-13-00435 5
7 AAP AAP-D-13-00498 4
8 AAP AAP-D-13-00530 0
9 AAP AAP-D-13-00747 3

