Python 在组熊猫数据框中提取具有最大值的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19818756/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract row with maximum value in a group pandas dataframe
提问by user1140126
A similar question is asked here: Python : Getting the Row which has the max value in groups using groupby
这里问了一个类似的问题: Python : Getting the Row which has the max value in groups using groupby
However, I just need one record per group even if there are more than one record with maximum value in that group.
但是,即使该组中有多个具有最大值的记录,我也只需要每组一条记录。
In the example below, I need one record for "s2". For me it doesn't matter which one.
在下面的示例中,我需要“s2”的一条记录。对我来说,哪一个并不重要。
>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
Mt Sp Value count
0 s1 a 1 3
1 s1 b 2 2
2 s2 c 3 5
3 s2 d 4 10
4 s2 e 5 10
5 s3 f 6 6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
Mt Sp Value count
0 s1 a 1 3
3 s2 d 4 10
4 s2 e 5 10
5 s3 f 6 6
>>>
采纳答案by waitingkuo
You can use first
您可以使用 first
In [14]: df.groupby('Mt').first()
Out[14]:
Sp Value count
Mt
s1 a 1 3
s2 c 3 5
s3 f 6 6
Update
更新
Set as_index=False
to achieve your goal
设置as_index=False
来实现你的目标
In [28]: df.groupby('Mt', as_index=False).first()
Out[28]:
Mt Sp Value count
0 s1 a 1 3
1 s2 c 3 5
2 s3 f 6 6
Update Again
再次更新
Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group
抱歉误解了你的意思。如果您想要组中最大数量的那个,您可以先对其进行排序
In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
Out[196]:
Mt Sp Value count
0 s1 a 1 3
1 s2 e 5 10
2 s3 f 6 6
回答by Roman Pekar
To get first occurence of maximum count
you can use pandas.DataFrame.idxmax()function:
要获得最大值的第一次出现,count
您可以使用pandas.DataFrame.idxmax()函数:
>>> df.iloc[df.groupby(['Mt']).apply(lambda x: x['count'].idxmax())]
Mt Sp Value count
0 s1 a 1 3
3 s2 d 4 10
5 s3 f 6 6
回答by Ian Schultz
Playing off of Roman Pekar's answer, I found that that the following code would work:
根据 Roman Pekar 的回答,我发现以下代码可以工作:
from math import isnan
df.iloc[[int(x) for x in df.groupby(by=df.Mt).apply(lambda x: x['count'].idxmax()).values if not isnan(y)]]
Note the isnan condition, as my application had some nan entries in the column we are maximizing over.
请注意 isnan 条件,因为我的应用程序在我们最大化的列中有一些 nan 条目。