使用 Pandas 查找分组行的最小值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41450963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Pandas to Find Minimum Values of Grouped Rows
提问by shishy
This might be a trivial question but I'm still trying to figure out pandas/numpy.
这可能是一个微不足道的问题,但我仍在试图弄清楚 pandas/numpy.
So, suppose I have a table with the following structure:
因此,假设我有一个具有以下结构的表:
group_id | col1 | col2 | col3 | "A" | "B"
x | 1 | 2 | 3 | NaN | 1
x | 3 | 2 | 3 | 1 | 1
x | 4 | 2 | 3 | 2 | 1
y | 1 | 2 | 3 | NaN | 3
y | 3 | 2 | 3 | 3 | 3
z | 3 | 2 | 3 | 10 | 2
z | 2 | 2 | 3 | 6 | 2
z | 4 | 2 | 3 | 4 | 2
z | 4 | 2 | 3 | 2 | 2
Note that there is a group_id that groups elements in each row. So at the beginning, I have the values for columns group_id and col1-col3.
请注意,有一个 group_id 对每行中的元素进行分组。所以一开始,我有 group_id 和 col1-col3 列的值。
Then for each row, if col1, col2, or col3 have value = 1, then "A" is NaN, otherwise the value is based on a formula (irrelevant for here so I put some numbers in place).
然后对于每一行,如果 col1、col2 或 col3 的值 = 1,则“A”为 NaN,否则该值基于公式(与此处无关,因此我放置了一些数字)。
That, I know how to do using:
那,我知道如何使用:
df["A"] = np.where(((df['col1'] == 1)|(df['col2']== 1) | (df['col3']) == 1))), NaN, value)
But for column "B", I need to fill it in with the minimum of values from column A for a specific group.
但是对于“B”列,我需要用特定组的 A 列中的最小值填充它。
So for example, "B" is equal to "1" for all rows with group X because the minimum value in column A for all of the group "x" rows is equal to 1.
例如,对于组 X 的所有行,“B”等于“1”,因为 A 列中所有组“x”行的最小值等于 1。
Similarly, for rows in group "y", the minimum value is 3, and for group "z" the minimum value is 2. How exactly do I do that using pandas...? It's confusing me a little more because the number of rows for a specific group can be of varying size.
同样,对于“y”组中的行,最小值为 3,而对于“z”组中的行,最小值为 2。我究竟如何使用 Pandas 做到这一点...?这让我更加困惑,因为特定组的行数可能大小不一。
If they were all the same size I could just say fill it with the minimum of values in a pre-set range.
如果它们的大小都相同,我只能说用预设范围内的最小值填充它。
I hope that made sense; please let me know if I should provide a clearer example or clarify anything!
我希望这是有道理的; 请让我知道我是否应该提供更清晰的示例或澄清任何内容!
回答by Ted Petrou
To get the minimum of column A for each group use transform
要获得每个组使用的 A 列的最小值 transform
df.groupby('group_id')['A'].transform('min')
回答by piRSquared
- focus on just
['col1', 'col2', 'col3']
- see if they are equal to
1
witheq(1)
equivalent to== 1
- see if any are equal to one along
axis=1
withany(1)
- use
loc
to make assignment
- 专注于
['col1', 'col2', 'col3']
- 看看它们是否等于
1
和eq(1)
等价于== 1
- 看看是否有任何分别等于沿着
axis=1
与any(1)
- 用于
loc
分配
anyone = df[['col1', 'col2', 'col3']].eq(1).any(1)
df.loc[anyone, 'A'] = np.nan
numpy equivalent
numpy 等价物
anyone = (df[['col1', 'col2', 'col3']].values == 1).any(1)
df.A = np.where(anyone, np.nan, df.A)