将 groupby 输出到 csv 文件 pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40899021/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
output groupby to csv file pandas
提问by Jessica
I have a sample dataset:
我有一个示例数据集:
import pandas as pd
df = {'ID': ['H1','H2','H3','H4','H5','H6'],
'AA1': ['C','B','B','X','G','G'],
'AA2': ['W','K','K','A','B','B'],
'name':['n1','n2','n3','n4','n5','n6']
}
df = pd.DataFrame(df)
it looks like :
看起来像 :
df
Out[32]:
AA1 AA2 ID name
0 C W H1 n1
1 B K H2 n2
2 B K H3 n3
3 X A H4 n4
4 G B H5 n5
5 G B H6 n6
I want to groupby AA1 and AA2 (unique AA1 and AA2 pair) and it doesn't matter which ID and name values the unique pair picks along with it, and output that to a .csv file, so the output in the .csv file would look like:
我想对 AA1 和 AA2(唯一的 AA1 和 AA2 对)进行分组,并且唯一的对选择哪个 ID 和名称值并不重要,并将其输出到 .csv 文件,因此输出在 .csv 文件中看起来像:
AA1 AA2 ID name
C W H1 n1
B K H2 n2
X A H4 n4
G B H5 n5
i tried the code:
我试过代码:
df.groupby('AA1','AA2').apply(to_csv('merged.txt', sep = '\t', index=False))
but the to_csv was not recognized, what can i put in the .apply() to just output the groupby results to a csv file?
但是 to_csv 无法识别,我可以在 .apply() 中放入什么来将 groupby 结果输出到 csv 文件?
回答by Julien Marrec
The problem is that you are trying to apply a function to_csv
which doesn't exist. Anyway, groupby also doesn't have a to_csv method. pd.Series
and pd.DataFrame
do.
问题是您正在尝试应用一个to_csv
不存在的函数。无论如何, groupby 也没有 to_csv 方法。pd.Series
和pd.DataFrame
做。
What you should really use is drop_duplicates
here and then export the resulting dataframe to csv:
您真正应该使用的是drop_duplicates
here,然后将结果数据框导出到csv:
df.drop_duplicates(['AA1','AA2']).to_csv('merged.txt')
PS: If you really wanted a groupby solution, there's this one that happens to be 12 times slower than drop_duplicates...:
PS:如果你真的想要一个 groupby 解决方案,那么这个解决方案恰好比 drop_duplicates 慢 12 倍......:
df.groupby(['AA1','AA2']).agg(lambda x:x.value_counts().index[0]).to_csv('merged.txt')