pandas 熊猫：组内最大值和最小值之间的差异

Question

提问by David

Given a data frame that looks like this

给定一个看起来像这样的数据框

GROUP VALUE
  1     5
  2     2
  1     10
  2     20
  1     7

I would like to compute the difference between the largest and smallest value within each group. That is, the result should be

我想计算每组中最大值和最小值之间的差异。也就是说，结果应该是

GROUP   DIFF
  1      5
  2      18

What is an easy way to do this in Pandas?

在 Pandas 中有什么简单的方法可以做到这一点？

What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?

在 Pandas 中，对于大约有 200 万行和 100 万个组的数据框，有什么快速的方法可以做到这一点？

Answer 1

回答by piRSquared

Using @unutbu 's df

使用 @unutbu 的 df

per timing
unutbu's solution is best over large data sets

每个时间
unutbu 的解决方案最适合大型数据集

import pandas as pd
import numpy as np

df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})

df.groupby('GROUP')['VALUE'].agg(np.ptp)

GROUP
1     5
2    18
Name: VALUE, dtype: int64

np.ptpdocsreturns the range of an array

np.ptpdocs返回数组的范围

timing
small df

定时
小df

large df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

大的 df
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))

large df
many groups
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

大df
许多组
df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))

Answer 2

回答by unutbu

groupby/agggenerally performs best when you take advantage of the built-in aggregators such as 'max'and 'min'. So to obtain the difference, first compute the maxand minand then subtract:

groupby/agg当您利用内置的聚合如通常表现最好'max'和'min'。因此获得的区别，首先计算max和min，然后减去：

import pandas as pd
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
result = df.groupby('GROUP')['VALUE'].agg(['max','min'])
result['diff'] = result['max']-result['min']
print(result[['diff']])

yields

产量

       diff
GROUP      
1         5
2        18

Answer 3

回答by ASGM

You can use groupby(), min(), and max():

您可以使用groupby()，min()以及max()：

df.groupby('GROUP')['VALUE'].apply(lambda g: g.max() - g.min())

pandas 熊猫：组内最大值和最小值之间的差异

提问by David

回答by piRSquared

回答by unutbu

回答by ASGM

相关推荐

最近更新

标签

pandas 熊猫：组内最大值和最小值之间的差异

提问by David

回答by piRSquared

回答by unutbu

回答by ASGM

相关推荐

pandas 熊猫：时间戳到日期时间

Pandas：查询字符串，其中列名包含特殊字符

pandas 使用函数在pandas df中添加一列

Pandas：根据唯一值获取行中对应的列值

相关推荐

最近更新

标签