pandas 如何从pandas groupby中的多列中获取唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36106490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:54:11  来源:igfitidea点击:

How to get unique values from multiple columns in a pandas groupby

pythonpandas

提问by Fabio Lamanna

Starting from this dataframe df:

从这个数据帧 df 开始:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})

   c l1 l2
0  1  a  b
1  1  a  d
2  1  b  d
3  2  c  f
4  2  c  e
5  2  b  f

I would like to perform a groupby over the ccolumn to get unique values of the l1and l2columns. For one columns I can do:

我想对c列执行 groupby以获取l1l2列的唯一值。对于一列,我可以这样做:

g = df.groupby('c')['l1'].unique()

that correctly returns:

正确返回:

c
1    [a, b]
2    [c, b]
Name: l1, dtype: object

but using:

但使用:

g = df.groupby('c')['l1','l2'].unique()

returns:

返回:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

I know I can get the unique values for the two columns with (among others):

我知道我可以获得两列的唯一值(除其他外):

In [12]: np.unique(df[['l1','l2']])
Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)

Is there a way to apply this method to the groupby in order to get something like:

有没有办法将此方法应用于 groupby 以获得类似的东西:

c
1    [a, b, d]
2    [c, b, e, f]
Name: l1, dtype: object

回答by ayhan

You can do it with apply:

你可以这样做apply

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

回答by Yaakov Bressler

Alternatively, you can use agg:

或者,您可以使用agg

g = df.groupby('c')['l1','l2'].agg(['unique'])