将 groupby 输出到 csv 文件 pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40899021/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:33:13  来源:igfitidea点击:

output groupby to csv file pandas

pythonpandas

提问by Jessica

I have a sample dataset:

我有一个示例数据集:

import pandas as pd
df = {'ID': ['H1','H2','H3','H4','H5','H6'],
      'AA1': ['C','B','B','X','G','G'],
      'AA2': ['W','K','K','A','B','B'],
      'name':['n1','n2','n3','n4','n5','n6']
}

df = pd.DataFrame(df)

it looks like :

看起来像 :

df
Out[32]: 
   AA1 AA2  ID name
0   C   W  H1   n1
1   B   K  H2   n2
2   B   K  H3   n3
3   X   A  H4   n4
4   G   B  H5   n5
5   G   B  H6   n6

I want to groupby AA1 and AA2 (unique AA1 and AA2 pair) and it doesn't matter which ID and name values the unique pair picks along with it, and output that to a .csv file, so the output in the .csv file would look like:

我想对 AA1 和 AA2(唯一的 AA1 和 AA2 对)进行分组,并且唯一的对选择哪个 ID 和名称值并不重要,并将其输出到 .csv 文件,因此输出在 .csv 文件中看起来像:

 AA1 AA2  ID name
  C   W  H1   n1
  B   K  H2   n2
  X   A  H4   n4
  G   B  H5   n5

i tried the code:

我试过代码:

df.groupby('AA1','AA2').apply(to_csv('merged.txt', sep = '\t', index=False))

but the to_csv was not recognized, what can i put in the .apply() to just output the groupby results to a csv file?

但是 to_csv 无法识别,我可以在 .apply() 中放入什么来将 groupby 结果输出到 csv 文件?

回答by Julien Marrec

The problem is that you are trying to apply a function to_csvwhich doesn't exist. Anyway, groupby also doesn't have a to_csv method. pd.Seriesand pd.DataFramedo.

问题是您正在尝试应用一个to_csv不存在的函数。无论如何, groupby 也没有 to_csv 方法。pd.Seriespd.DataFrame做。

What you should really use is drop_duplicateshere and then export the resulting dataframe to csv:

您真正应该使用的是drop_duplicateshere,然后将结果数据框导出到csv:

df.drop_duplicates(['AA1','AA2']).to_csv('merged.txt')


PS: If you really wanted a groupby solution, there's this one that happens to be 12 times slower than drop_duplicates...:

PS:如果你真的想要一个 groupby 解决方案,那么这个解决方案恰好比 drop_duplicates 慢 12 倍......:

df.groupby(['AA1','AA2']).agg(lambda x:x.value_counts().index[0]).to_csv('merged.txt')

回答by piRSquared

you can use groupbywith head

你可以用groupbyhead

df.groupby(['AA1', 'AA2']).head(1)

enter image description here

在此处输入图片说明