Pandas - 使用 to_csv 写入多索引行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17349574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:57:35  来源:igfitidea点击:

Pandas - write Multiindex rows with to_csv

pythoncsvpandasmulti-index

提问by

I am using to_csv to write a Multiindex DataFrame to csv files. The csv file has one column that contains the multiindexes in tuples, like:

我正在使用 to_csv 将 Multiindex DataFrame 写入 csv 文件。csv 文件有一列包含元组中的多索引,例如:

('a', 'x')
('a', 'y')
('a', 'z')
('b', 'x')
('b', 'y')
('b', 'z')

However, I want to be able to output the Multiindex to two columns instead of one column of tuples, such as:

但是,我希望能够将 Multiindex 输出到两列而不是一列元组,例如:

a, x
 , y
 , z
b, x
 , y
 , z

It looks like tupleize_colscan achieve this for columns, but there is no such option for the rows. Is there a way to achieve this?

看起来tupleize_cols可以为列实现这一点,但行没有这样的选项。有没有办法实现这一目标?

采纳答案by Jeff

I think this will do it

我认为这会做到

In [3]: df = DataFrame(dict(A = 'foo', B = 'bar', value = 1),index=range(5)).set_index(['A','B'])

In [4]: df
Out[4]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1

In [5]: df.to_csv('test.csv')

In [6]: !cat test.csv
A,B,value
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1
foo,bar,1

In [7]: pd.read_csv('test.csv',index_col=[0,1])
Out[7]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1

To write with the index duplication (kind of a hack though)

用索引重复写入(虽然有点黑客)

In [27]: x = df.reset_index()

In [28]: mask = df.index.to_series().duplicated()

In [29]: mask
Out[29]:?
A ? ?B ?
foo ?bar ? ?False
? ? ?bar ? ? True
? ? ?bar ? ? True
? ? ?bar ? ? True
? ? ?bar ? ? True
dtype: bool

In [30]: x.loc[mask.values,['A','B']] = ''

In [31]: x
Out[31]:?
? ? ?A ? ?B ?value
0 ?foo ?bar ? ? ?1
1 ? ? ? ? ? ? ? ?1
2 ? ? ? ? ? ? ? ?1
3 ? ? ? ? ? ? ? ?1
4 ? ? ? ? ? ? ? ?1

In [32]: x.to_csv('test.csv')

In [33]: !cat test.csv
,A,B,value
0,foo,bar,1
1,,,1
2,,,1
3,,,1
4,,,1

Read back is a bit tricky actually

回读实际上有点棘手

In [37]: pd.read_csv('test.csv',index_col=0).ffill().set_index(['A','B'])
Out[37]: 
         value
A   B         
foo bar      1
    bar      1
    bar      1
    bar      1
    bar      1