使用 Pandas 查找分组行的最小值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41450963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:42:52  来源:igfitidea点击:

Using Pandas to Find Minimum Values of Grouped Rows

pythonpandasnumpydataframe

提问by shishy

This might be a trivial question but I'm still trying to figure out pandas/numpy.

这可能是一个微不足道的问题,但我仍在试图弄清楚 pandas/numpy.

So, suppose I have a table with the following structure:

因此,假设我有一个具有以下结构的表:

group_id | col1 | col2 | col3 |  "A"   |  "B"
   x     |   1  |   2  |  3   |  NaN   |   1
   x     |   3  |   2  |  3   |   1    |   1 
   x     |   4  |   2  |  3   |   2    |   1
   y     |   1  |   2  |  3   |  NaN   |   3 
   y     |   3  |   2  |  3   |   3    |   3 
   z     |   3  |   2  |  3   |   10   |   2
   z     |   2  |   2  |  3   |   6    |   2
   z     |   4  |   2  |  3   |   4    |   2
   z     |   4  |   2  |  3   |   2    |   2

Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.

请注意,有一个 group_id 对每行中的元素进行分组。所以一开始,我有 group_id 和 col1-col3 列的值。

Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).

然后对于每一行,如果 col1、col2 或 col3 的值 = 1,则“A”为 NaN,否则该值基于公式(与此处无关,因此我放置了一些数字)。

That, I know how to do using:

那,我知道如何使用:

df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)

But for column "B", I need to fill it in with the minimum of values from column A for a specific group.

但是对于“B”列,我需要用特定组的 A 列中的最小值填充它。

So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.

例如,对于组 X 的所有行,“B”等于“1”,因为 A 列中所有组“x”行的最小值等于 1。

Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.

同样,对于“y”组中的行,最小值为 3,而对于“z”组中的行,最小值为 2。我究竟如何使用 Pandas 做到这一点...?这让我更加困惑,因为特定组的行数可能大小不一。

If they were all the same size I could just say fill it with the minimum of values in a pre-set range.

如果它们的大小都相同,我只能说用预设范围内的最小值填充它。

I hope that made sense; please let me know if I should provide a clearer example or clarify anything!

我希望这是有道理的; 请让我知道我是否应该提供更清晰的示例或澄清任何内容!

回答by Ted Petrou

To get the minimum of column A for each group use transform

要获得每个组使用的 A 列的最小值 transform

df.groupby('group_id')['A'].transform('min')

回答by piRSquared

  • focus on just ['col1', 'col2', 'col3']
  • see if they are equal to 1with eq(1)equivalent to == 1
  • see if any are equal to one along axis=1with any(1)
  • use locto make assignment
  • 专注于 ['col1', 'col2', 'col3']
  • 看看它们是否等于1eq(1)等价于== 1
  • 看看是否有任何分别等于沿着axis=1any(1)
  • 用于loc分配


anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan

numpy equivalent

numpy 等价物

anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)