当 groupby 另一个时，pandas 在组中最少获得一列

Question

提问by Legit Stack

I have a pandas dataframe that looks like this:

我有一个如下所示的 Pandas 数据框：

      c     y
0     9     0
1     8     0
2     3     1
3     6     2
4     1     3
5     2     3
6     5     3
7     4     4
8     0     4
9     7     4

I'd like to groupby yand get the min and max of cso that my new dataframe would look like this:

我想分组y并获取最小值和最大值，c以便我的新数据框如下所示：

      c     y     min   max
0     9     0     8     9
1     8     0     8     9
2     3     1     3     3   
3     6     2     6     6 
4     1     3     1     5
5     2     3     1     5
6     5     3     1     5
7     4     4     0     7
8     0     4     0     7
9     7     4     0     7

I tried using df['min'] = df.groupby(['y'])['c'].min()but that gave me some weird results. The first 175 rows were populated in the min column but then it went to NaN for all the rest. is that not how you're supposed to use the groupby method?

我尝试使用，df['min'] = df.groupby(['y'])['c'].min()但这给了我一些奇怪的结果。前 175 行填充在 min 列中，但随后所有其他行都变为 NaN。这不是您应该如何使用 groupby 方法吗？

Answer 1

回答by Zero

Option 1Use transform

选项 1使用transform

In [13]: dfc = df.groupby('y')['c']

In [14]: df.assign(min=dfc.transform(min), max=dfc.transform(max))
Out[14]:
   c  y  max  min
0  9  0    9    8
1  8  0    9    8
2  3  1    3    3
3  6  2    6    6
4  1  3    5    1
5  2  3    5    1
6  5  3    5    1
7  4  4    7    0
8  0  4    7    0
9  7  4    7    0

Or

或者

In [15]: df['min' ] = dfc.transform('min')

In [16]: df['max' ] = dfc.transform('max')

Option 2Use join and agg

选项 2使用 join 和 agg

In [30]: df.join(df.groupby('y')['c'].agg(['min', 'max']), on='y')
Out[30]:
   c  y  min  max
0  9  0    8    9
1  8  0    8    9
2  3  1    3    3
3  6  2    6    6
4  1  3    1    5
5  2  3    1    5
6  5  3    1    5
7  4  4    0    7
8  0  4    0    7
9  7  4    0    7

Option 3Use merge and agg

选项 3使用合并和聚合

In [28]: df.merge(df.groupby('y')['c'].agg(['min', 'max']), right_index=True, left_on='y')
Out[28]:
   c  y  min  max
0  9  0    8    9
1  8  0    8    9
2  3  1    3    3
3  6  2    6    6
4  1  3    1    5
5  2  3    1    5
6  5  3    1    5
7  4  4    0    7
8  0  4    0    7
9  7  4    0    7

Answer 2

回答by piRSquared

With Numpy shenanigans

使用 Numpy 恶作剧

n = df.y.max() + 1
omax = np.ones(n, df.c.values.dtype) * df.c.values.min()
omin = np.ones(n, df.c.values.dtype) * df.c.values.max()
np.maximum.at(omax, df.y.values, df.c.values)
np.minimum.at(omin, df.y.values, df.c.values)

df.assign(min=omin[df.y], max=omax[df.y])

   c  y  min  max
0  9  0    8    9
1  8  0    8    9
2  3  1    3    3
3  6  2    6    6
4  1  3    1    5
5  2  3    1    5
6  5  3    1    5
7  4  4    0    7
8  0  4    0    7
9  7  4    0    7

当 groupby 另一个时，pandas 在组中最少获得一列

提问by Legit Stack

回答by Zero

回答by piRSquared

相关推荐

最近更新

标签

当 groupby 另一个时，pandas 在组中最少获得一列

提问by Legit Stack

回答by Zero

回答by piRSquared

相关推荐

pandas 忽略熊猫 astype 中的错误

pandas 在熊猫中获取组名的有效方法

pandas 使用子图的熊猫条形图

pandas 如何选择数据框中大于给定值的所有元素

相关推荐

最近更新

标签