pandas 将数据框保存和加载到 csv 导致未命名列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19428904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Saving and Loading of dataframe to csv results in Unnamed columns
提问by idoda
prob in the title. exaple:
标题中的问题。例子:
x=[('a','a','c') for i in range(5)]
df = DataFrame(x,columns=['col1','col2','col3'])
df.to_csv('test.csv')
df1 = read_csv('test.csv')
Unnamed: 0 col1 col2 col3
0 0 a a c
1 1 a a c
2 2 a a c
3 3 a a c
4 4 a a c
The reason seems to be that when saving a dataframe, the index column is written also, with no name in the header. then when you load the csv again, it is loaded with the index column as unnamed column. Is this a bug? How can I avoid writing a csv with the index, or dropping unnamed columns in reading?
原因似乎是在保存数据帧时,索引列也被写入,标题中没有名称。然后当您再次加载 csv 时,它会加载索引列作为未命名列。这是一个错误吗?如何避免使用索引编写 csv,或在读取时删除未命名的列?
回答by Max
You can remove row labels via the indexand index_labelparameters of to_csv.
回答by Jeff
These are not symmetric as there are ambiguities in the csv format because of the positioning. You need to specify an index_colon read-back
这些不是对称的,因为 csv 格式由于定位存在歧义。您需要指定一个index_col回读
In [1]: x=[('a','a','c') for i in range(5)]
In [2]: df = DataFrame(x,columns=['col1','col2','col3'])
In [3]: df.to_csv('test.csv')
In [4]: !cat test.csv
,col1,col2,col3
0,a,a,c
1,a,a,c
2,a,a,c
3,a,a,c
4,a,a,c
In [5]: pd.read_csv('test.csv',index_col=0)
Out[5]:
col1 col2 col3
0 a a c
1 a a c
2 a a c
3 a a c
4 a a c
This looks very similar to the above, so is 'foo' a column or an index?
这看起来与上面的非常相似,那么 'foo' 是列还是索引?
In [6]: df.index.name = 'foo'
In [7]: df.to_csv('test.csv')
In [8]: !cat test.csv
foo,col1,col2,col3
0,a,a,c
1,a,a,c
2,a,a,c
3,a,a,c
4,a,a,c
回答by Денис Волконский
That s how use index
df.to_csv('test.csv', index_label=False)But for me, when I've tried submit to Kaggle it's return error "ERROR: Record 1 had 3 columns but expected 2", so I solved it use this code.
这就是使用索引的方式
df.to_csv('test.csv', index_label=False)但是对我来说,当我尝试提交给 Kaggle 时,它返回错误“错误:记录 1 有 3 列但预期为 2”,所以我使用此代码解决了它。

