带有 2 行标题的 Pandas 数据框并导出到 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24372993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:11:16  来源:igfitidea点击:

pandas dataframe with 2-rows header and export to csv

pythoncsvpandasdataframe

提问by Meloun

I have a dataframe

我有一个数据框

df = pd.DataFrame(columns = ["AA", "BB", "CC"])
df.loc[0]= ["a", "b", "c1"]
df.loc[1]= ["a", "b", "c2"]
df.loc[2]= ["a", "b", "c3"]

I need to add secod row to header

我需要将第二行添加到标题

df.columns = pd.MultiIndex.from_tuples(zip(df.columns, ["DD", "EE", "FF"]))

my df is now

我的 df 现在是

  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

but when I write this dataframe to csv file

但是当我将此数据框写入 csv 文件时

df.to_csv("test.csv", index = False)

I get one more row than expected

我得到比预期多一排

AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3

采纳答案by DSM

It's an ugly hack, but if you needed something to work Right Now(tm), you could write it out in two parts:

这是一个丑陋的黑客,但如果你现在需要一些东西来工作(tm),你可以把它写成两部分:

>>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False)
>>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False)
>>> !cat noblankrows.csv
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3

回答by Andy Hayden

I think this is a bug in to_csv. If you're looking for workarounds then here's a couple.

我认为这是to_csv. 如果您正在寻找解决方法,那么这里有几个。

To read back in this csv specify the header rows*:

要在此 csv 中回读指定标题行*:

In [11]: csv = "AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3"

In [12]: pd.read_csv(StringIO(csv), header=[0, 1])
Out[12]:
  AA BB  CC
  DD EE  FF
0  a  b  c1
1  a  b  c2
2  a  b  c3

*strangely this seems to ignore the blank lines.

*奇怪的是,这似乎忽略了空行。

To write out you could write the header first and then append:

要写出,您可以先编写标题,然后附加:

with open('test.csv', 'w') as f:
    f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n')
df.to_csv('test.csv', mode='a', index=False, header=False)

Note the to_csvpart for MultiIndex column here:

请注意to_csv此处 MultiIndex 列的部分:

In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n'
Out[21]: 'AA,BB,CC\nDD,EE,FF\n'

回答by Bluu

Building on top of @DSM's solution:

建立在@DSM 的解决方案之上:

if you need (as I did) to apply the same hack to an export to excel, the main change needed (apart from expected differences with the to_excel method) is to actually remove the multiindex used for your column labels...

如果您需要(就像我一样)将相同的 hack 应用到导出到excel,需要的主要更改(除了与 to_excel 方法的预期差异)是实际删除用于列标签的多索引...

That's because .to_excel doesn't support writing out a df having a multiindex for columns but no index (providing index=False to the .to_excel method) contrarily to .to_csv

那是因为 .to_excel 不支持写出具有多索引的 df 列但没有索引(为 .to_excel 方法提供 index=False)与 .to_csv 相反

Anyway, here's what it would look like:

无论如何,这就是它的样子:

>>> writer = pd.ExcelWriter("noblankrows.xlsx")
>>> headers = pd.DataFrame(df.columns.tolist()).T
>>> headers.to_excel(
        writer, header=False, index=False)
>>> df.columns = pd.Index(range(len(df.columns)))  # that's what I was referring to...
>>> df.to_excel(
        writer, header=False, index=False, startrow=len(headers))
>>> writer.save()
>>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False)
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3

回答by CT Zhu

Use df.to_csv("test.csv", index = False, tupleize_cols=True)to get the resulting CSV to be:

使用df.to_csv("test.csv", index = False, tupleize_cols=True)获得所产生的CSV是:

"('AA', 'DD')","('BB', 'EE')","('CC', 'FF')"
a,b,c1
a,b,c2
a,b,c3

To read it back:

读回来:

df2=pd.read_csv("test.csv", tupleize_cols=True)
df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns)))

To get the exact output you wanted:

要获得您想要的确切输出:

with open('test.csv', 'a') as f:
    pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False)
    df.to_csv(f, index = False, header=False)