pandas 将数据框保存和加载到 csv 导致未命名列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19428904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:14:54  来源:igfitidea点击:

Saving and Loading of dataframe to csv results in Unnamed columns

pythonpandas

提问by idoda

prob in the title. exaple:

标题中的问题。例子:

x=[('a','a','c') for i in range(5)]
df = DataFrame(x,columns=['col1','col2','col3'])
df.to_csv('test.csv')
df1 = read_csv('test.csv')

   Unnamed: 0 col1 col2 col3
0           0    a    a    c
1           1    a    a    c
2           2    a    a    c
3           3    a    a    c
4           4    a    a    c

The reason seems to be that when saving a dataframe, the index column is written also, with no name in the header. then when you load the csv again, it is loaded with the index column as unnamed column. Is this a bug? How can I avoid writing a csv with the index, or dropping unnamed columns in reading?

原因似乎是在保存数据帧时,索引列也被写入,标题中没有名称。然后当您再次加载 csv 时,它会加载索引列作为未命名列。这是一个错误吗?如何避免使用索引编写 csv,或在读取时删除未命名的列?

回答by Max

You can remove row labels via the indexand index_labelparameters of to_csv.

您可以通过to_csvindexindex_label参数删除行标签。

回答by Jeff

These are not symmetric as there are ambiguities in the csv format because of the positioning. You need to specify an index_colon read-back

这些不是对称的,因为 csv 格式由于定位存在歧义。您需要指定一个index_col回读

In [1]: x=[('a','a','c') for i in range(5)]

In [2]: df = DataFrame(x,columns=['col1','col2','col3'])

In [3]: df.to_csv('test.csv')

In [4]: !cat test.csv
,col1,col2,col3
0,a,a,c
1,a,a,c
2,a,a,c
3,a,a,c
4,a,a,c

In [5]: pd.read_csv('test.csv',index_col=0)
Out[5]: 
  col1 col2 col3
0    a    a    c
1    a    a    c
2    a    a    c
3    a    a    c
4    a    a    c

This looks very similar to the above, so is 'foo' a column or an index?

这看起来与上面的非常相似,那么 'foo' 是列还是索引?

In [6]: df.index.name = 'foo'

In [7]: df.to_csv('test.csv')

In [8]: !cat test.csv
foo,col1,col2,col3
0,a,a,c
1,a,a,c
2,a,a,c
3,a,a,c
4,a,a,c

回答by Денис Волконский

That s how use index df.to_csv('test.csv', index_label=False)But for me, when I've tried submit to Kaggle it's return error "ERROR: Record 1 had 3 columns but expected 2", so I solved it use this code.

这就是使用索引的方式 df.to_csv('test.csv', index_label=False)但是对我来说,当我尝试提交给 Kaggle 时,它​​返回错误“错误:记录 1 有 3 列但预期为 2”,所以我使用此代码解决了它。

回答by piokuc

You can specify explicitly which columns you want to write using colsparameter.

您可以使用cols参数明确指定要写入的列。