当 groupby 另一个时,pandas 在组中最少获得一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51074911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas get minimum of one column in group when groupby another
提问by Legit Stack
I have a pandas dataframe that looks like this:
我有一个如下所示的 Pandas 数据框:
c y
0 9 0
1 8 0
2 3 1
3 6 2
4 1 3
5 2 3
6 5 3
7 4 4
8 0 4
9 7 4
I'd like to groupby y
and get the min and max of c
so that my new dataframe would look like this:
我想分组y
并获取最小值和最大值,c
以便我的新数据框如下所示:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
I tried using df['min'] = df.groupby(['y'])['c'].min()
but that gave me some weird results. The first 175 rows were populated in the min column but then it went to NaN for all the rest. is that not how you're supposed to use the groupby method?
我尝试使用,df['min'] = df.groupby(['y'])['c'].min()
但这给了我一些奇怪的结果。前 175 行填充在 min 列中,但随后所有其他行都变为 NaN。这不是您应该如何使用 groupby 方法吗?
回答by Zero
Option 1Use transform
选项 1使用transform
In [13]: dfc = df.groupby('y')['c']
In [14]: df.assign(min=dfc.transform(min), max=dfc.transform(max))
Out[14]:
c y max min
0 9 0 9 8
1 8 0 9 8
2 3 1 3 3
3 6 2 6 6
4 1 3 5 1
5 2 3 5 1
6 5 3 5 1
7 4 4 7 0
8 0 4 7 0
9 7 4 7 0
Or
或者
In [15]: df['min' ] = dfc.transform('min')
In [16]: df['max' ] = dfc.transform('max')
Option 2Use join and agg
选项 2使用 join 和 agg
In [30]: df.join(df.groupby('y')['c'].agg(['min', 'max']), on='y')
Out[30]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
Option 3Use merge and agg
选项 3使用合并和聚合
In [28]: df.merge(df.groupby('y')['c'].agg(['min', 'max']), right_index=True, left_on='y')
Out[28]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
回答by piRSquared
With Numpy shenanigans
使用 Numpy 恶作剧
n = df.y.max() + 1
omax = np.ones(n, df.c.values.dtype) * df.c.values.min()
omin = np.ones(n, df.c.values.dtype) * df.c.values.max()
np.maximum.at(omax, df.y.values, df.c.values)
np.minimum.at(omin, df.y.values, df.c.values)
df.assign(min=omin[df.y], max=omax[df.y])
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7