Python 在组熊猫数据框中提取具有最大值的行

Question

提问by user1140126

A similar question is asked here: Python : Getting the Row which has the max value in groups using groupby

这里问了一个类似的问题： Python : Getting the Row which has the max value in groups using groupby

However, I just need one record per group even if there are more than one record with maximum value in that group.

但是，即使该组中有多个具有最大值的记录，我也只需要每组一条记录。

In the example below, I need one record for "s2". For me it doesn't matter which one.

在下面的示例中，我需要“s2”的一条记录。对我来说，哪一个并不重要。

>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2
2  s2  c      3      5
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>>

Answer 1

采纳答案by waitingkuo

You can use first

您可以使用 first

In [14]: df.groupby('Mt').first()
Out[14]: 
   Sp  Value  count
Mt                 
s1  a      1      3
s2  c      3      5
s3  f      6      6

Update

更新

Set as_index=Falseto achieve your goal

设置as_index=False来实现你的目标

In [28]: df.groupby('Mt', as_index=False).first()
Out[28]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  c      3      5
2  s3  f      6      6

Update Again

再次更新

Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group

抱歉误解了你的意思。如果您想要组中最大数量的那个，您可以先对其进行排序

In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
Out[196]: 
   Mt Sp  Value  count
0  s1  a      1      3
1  s2  e      5     10
2  s3  f      6      6

Answer 2

回答by Roman Pekar

To get first occurence of maximum countyou can use pandas.DataFrame.idxmax()function:

要获得最大值的第一次出现，count您可以使用pandas.DataFrame.idxmax()函数：

>>> df.iloc[df.groupby(['Mt']).apply(lambda x: x['count'].idxmax())]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
5  s3  f      6      6

Answer 3

回答by Ian Schultz

Playing off of Roman Pekar's answer, I found that that the following code would work:

根据 Roman Pekar 的回答，我发现以下代码可以工作：

from math import isnan
df.iloc[[int(x) for x in df.groupby(by=df.Mt).apply(lambda x: x['count'].idxmax()).values if not isnan(y)]]

Note the isnan condition, as my application had some nan entries in the column we are maximizing over.

请注意 isnan 条件，因为我的应用程序在我们最大化的列中有一些 nan 条目。

Python 在组熊猫数据框中提取具有最大值的行

提问by user1140126

采纳答案by waitingkuo

Update

更新

Update Again

再次更新

回答by Roman Pekar

回答by Ian Schultz

相关推荐

最近更新

标签

Python 在组熊猫数据框中提取具有最大值的行

提问by user1140126

采纳答案by waitingkuo

Update

更新

Update Again

再次更新

回答by Roman Pekar

回答by Ian Schultz

相关推荐

Python 将数据帧拆分为多个数据帧

Python groupby 多个值，并绘制结果

如何在python中使用scipy.optimize中的leastsq函数将直线和二次线拟合到数据集x和y

Python pytesseract 找不到指定的文件

相关推荐

最近更新

标签