使用 Pandas 查找分组行的最小值

Question

提问by shishy

This might be a trivial question but I'm still trying to figure out pandas/numpy.

这可能是一个微不足道的问题，但我仍在试图弄清楚 pandas/numpy.

So, suppose I have a table with the following structure:

因此，假设我有一个具有以下结构的表：

group_id | col1 | col2 | col3 |  "A"   |  "B"
   x     |   1  |   2  |  3   |  NaN   |   1
   x     |   3  |   2  |  3   |   1    |   1 
   x     |   4  |   2  |  3   |   2    |   1
   y     |   1  |   2  |  3   |  NaN   |   3 
   y     |   3  |   2  |  3   |   3    |   3 
   z     |   3  |   2  |  3   |   10   |   2
   z     |   2  |   2  |  3   |   6    |   2
   z     |   4  |   2  |  3   |   4    |   2
   z     |   4  |   2  |  3   |   2    |   2

Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.

请注意，有一个 group_id 对每行中的元素进行分组。所以一开始，我有 group_id 和 col1-col3 列的值。

Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).

然后对于每一行，如果 col1、col2 或 col3 的值 = 1，则“A”为 NaN，否则该值基于公式（与此处无关，因此我放置了一些数字）。

That, I know how to do using:

那，我知道如何使用：

df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)

But for column "B", I need to fill it in with the minimum of values from column A for a specific group.

但是对于“B”列，我需要用特定组的 A 列中的最小值填充它。

So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.

例如，对于组 X 的所有行，“B”等于“1”，因为 A 列中所有组“x”行的最小值等于 1。

Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.

同样，对于“y”组中的行，最小值为 3，而对于“z”组中的行，最小值为 2。我究竟如何使用 Pandas 做到这一点...？这让我更加困惑，因为特定组的行数可能大小不一。

If they were all the same size I could just say fill it with the minimum of values in a pre-set range.

如果它们的大小都相同，我只能说用预设范围内的最小值填充它。

I hope that made sense; please let me know if I should provide a clearer example or clarify anything!

我希望这是有道理的; 请让我知道我是否应该提供更清晰的示例或澄清任何内容！

Answer 1

回答by Ted Petrou

To get the minimum of column A for each group use transform

要获得每个组使用的 A 列的最小值 transform

df.groupby('group_id')['A'].transform('min')

Answer 2

回答by piRSquared

focus on just ['col1', 'col2', 'col3']
see if they are equal to 1with eq(1)equivalent to == 1
see if any are equal to one along axis=1with any(1)
use locto make assignment

专注于 ['col1', 'col2', 'col3']
看看它们是否等于1和eq(1)等价于== 1
看看是否有任何分别等于沿着axis=1与any(1)
用于loc分配

anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan

numpy equivalent

numpy 等价物

anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)

使用 Pandas 查找分组行的最小值

提问by shishy

回答by Ted Petrou

回答by piRSquared

相关推荐

最近更新

标签

使用 Pandas 查找分组行的最小值

提问by shishy

回答by Ted Petrou

回答by piRSquared

相关推荐

无法使用这些索引器对 <class 'pandas.indexes.range.RangeIndex'> 进行切片索引

Pandas 的年化回报

pandas 在熊猫中使用 iterrows 的 for 循环

Pandas Resample 应用自定义函数？

相关推荐

最近更新

标签