带有 2 行标题的 Pandas 数据框并导出到 csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24372993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe with 2-rows header and export to csv
提问by Meloun
I have a dataframe
我有一个数据框
df = pd.DataFrame(columns = ["AA", "BB", "CC"])
df.loc[0]= ["a", "b", "c1"]
df.loc[1]= ["a", "b", "c2"]
df.loc[2]= ["a", "b", "c3"]
I need to add secod row to header
我需要将第二行添加到标题
df.columns = pd.MultiIndex.from_tuples(zip(df.columns, ["DD", "EE", "FF"]))
my df is now
我的 df 现在是
AA BB CC
DD EE FF
0 a b c1
1 a b c2
2 a b c3
but when I write this dataframe to csv file
但是当我将此数据框写入 csv 文件时
df.to_csv("test.csv", index = False)
I get one more row than expected
我得到比预期多一排
AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3
采纳答案by DSM
It's an ugly hack, but if you needed something to work Right Now(tm), you could write it out in two parts:
这是一个丑陋的黑客,但如果你现在需要一些东西来工作(tm),你可以把它写成两部分:
>>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False)
>>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False)
>>> !cat noblankrows.csv
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3
回答by Andy Hayden
I think this is a bug in to_csv. If you're looking for workarounds then here's a couple.
我认为这是to_csv. 如果您正在寻找解决方法,那么这里有几个。
To read back in this csv specify the header rows*:
要在此 csv 中回读指定标题行*:
In [11]: csv = "AA,BB,CC
DD,EE,FF
,,
a,b,c1
a,b,c2
a,b,c3"
In [12]: pd.read_csv(StringIO(csv), header=[0, 1])
Out[12]:
AA BB CC
DD EE FF
0 a b c1
1 a b c2
2 a b c3
*strangely this seems to ignore the blank lines.
*奇怪的是,这似乎忽略了空行。
To write out you could write the header first and then append:
要写出,您可以先编写标题,然后附加:
with open('test.csv', 'w') as f:
f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n')
df.to_csv('test.csv', mode='a', index=False, header=False)
Note the to_csvpart for MultiIndex column here:
请注意to_csv此处 MultiIndex 列的部分:
In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n'
Out[21]: 'AA,BB,CC\nDD,EE,FF\n'
回答by Bluu
Building on top of @DSM's solution:
建立在@DSM 的解决方案之上:
if you need (as I did) to apply the same hack to an export to excel, the main change needed (apart from expected differences with the to_excel method) is to actually remove the multiindex used for your column labels...
如果您需要(就像我一样)将相同的 hack 应用到导出到excel,需要的主要更改(除了与 to_excel 方法的预期差异)是实际删除用于列标签的多索引...
That's because .to_excel doesn't support writing out a df having a multiindex for columns but no index (providing index=False to the .to_excel method) contrarily to .to_csv
那是因为 .to_excel 不支持写出具有多索引的 df 列但没有索引(为 .to_excel 方法提供 index=False)与 .to_csv 相反
Anyway, here's what it would look like:
无论如何,这就是它的样子:
>>> writer = pd.ExcelWriter("noblankrows.xlsx")
>>> headers = pd.DataFrame(df.columns.tolist()).T
>>> headers.to_excel(
writer, header=False, index=False)
>>> df.columns = pd.Index(range(len(df.columns))) # that's what I was referring to...
>>> df.to_excel(
writer, header=False, index=False, startrow=len(headers))
>>> writer.save()
>>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False)
AA,BB,CC
DD,EE,FF
a,b,c1
a,b,c2
a,b,c3
回答by CT Zhu
Use df.to_csv("test.csv", index = False, tupleize_cols=True)to get the resulting CSV to be:
使用df.to_csv("test.csv", index = False, tupleize_cols=True)获得所产生的CSV是:
"('AA', 'DD')","('BB', 'EE')","('CC', 'FF')"
a,b,c1
a,b,c2
a,b,c3
To read it back:
读回来:
df2=pd.read_csv("test.csv", tupleize_cols=True)
df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns)))
To get the exact output you wanted:
要获得您想要的确切输出:
with open('test.csv', 'a') as f:
pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False)
df.to_csv(f, index = False, header=False)

