pandas 如何从pandas groupby中的多列中获取唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36106490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get unique values from multiple columns in a pandas groupby
提问by Fabio Lamanna
Starting from this dataframe df:
从这个数据帧 df 开始:
df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})
c l1 l2
0 1 a b
1 1 a d
2 1 b d
3 2 c f
4 2 c e
5 2 b f
I would like to perform a groupby over the c
column to get unique values of the l1
and l2
columns. For one columns I can do:
我想对c
列执行 groupby以获取l1
和l2
列的唯一值。对于一列,我可以这样做:
g = df.groupby('c')['l1'].unique()
that correctly returns:
正确返回:
c
1 [a, b]
2 [c, b]
Name: l1, dtype: object
but using:
但使用:
g = df.groupby('c')['l1','l2'].unique()
returns:
返回:
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'
I know I can get the unique values for the two columns with (among others):
我知道我可以获得两列的唯一值(除其他外):
In [12]: np.unique(df[['l1','l2']])
Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)
Is there a way to apply this method to the groupby in order to get something like:
有没有办法将此方法应用于 groupby 以获得类似的东西:
c
1 [a, b, d]
2 [c, b, e, f]
Name: l1, dtype: object
回答by ayhan
You can do it with apply
:
你可以这样做apply
:
import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))
回答by Yaakov Bressler
Alternatively, you can use agg
:
或者,您可以使用agg
:
g = df.groupby('c')['l1','l2'].agg(['unique'])