Python Pandas Dataframe 按组中的最大值选择行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32459325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:36:54  来源:igfitidea点击:

Python Pandas Dataframe select row by max value in group

pythonpandas

提问by user636322

I have a dataframe which was created via a df.pivot:

我有一个通过 df.pivot 创建的数据框:

type                             start  end
F_Type         to_date                     
A              20150908143000    345    316
B              20150908140300    NaN    480
               20150908140600    NaN    120
               20150908143000  10743   8803
C              20150908140100    NaN   1715
               20150908140200    NaN   1062
               20150908141000    NaN    145
               20150908141500    418    NaN
               20150908141800    NaN    450
               20150908142900   1973   1499
               20150908143000  19522  16659
D              20150908143000    433     65
E              20150908143000   7290   7375
F              20150908143000      0      0
G              20150908143000   1796    340

I would like to filter and return a single row for each 'F_TYPE' only returning the row with the Maximum 'to_date'. I would like to return the following dataframe:

我想为每个“F_TYPE”过滤并返回一行,只返回最大“to_date”的行。我想返回以下数据框:

type                             start  end
F_Type         to_date                     
A              20150908143000    345    316
B              20150908143000  10743   8803
C              20150908143000  19522  16659
D              20150908143000    433     65
E              20150908143000   7290   7375
F              20150908143000      0      0
G              20150908143000   1796    340

Thanks..

谢谢..

采纳答案by unutbu

A standard approach is to use groupby(keys)[column].idxmax(). However, to select the desired rows using idxmaxyou need idxmaxto return unique index values. One way to obtain a unique index is to call reset_index.

标准方法是使用groupby(keys)[column].idxmax(). 但是,要使用idxmax您选择所需的行,您需要idxmax返回唯一的索引值。获取唯一索引的一种方法是调用reset_index.

Once you obtain the index values from groupby(keys)[column].idxmax()you can then select the entire row using df.loc:

从您获得索引值后,groupby(keys)[column].idxmax()您可以使用df.loc以下命令选择整行:

In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]: 
                       start    end
F_Type to_date                     
A      20150908143000    345    316
B      20150908143000  10743   8803
C      20150908143000  19522  16659
D      20150908143000    433     65
E      20150908143000   7290   7375
F      20150908143000      0      0
G      20150908143000   1796    340

Note: idxmaxreturns index labels, not necessarily ordinals. After using reset_indexthe index labels happen to also be ordinals, but since idxmaxis returning labels (not ordinals) it is better to alwaysuse idxmaxin conjunction with df.loc, not df.iloc(as I originally did in this post.)

注意:idxmax返回索引标签,不一定是序数。使用后reset_index的指数标签碰巧也是序,但由于idxmax正在恢复标签(不是序号),最好是始终使用idxmax与配合df.loc,而不是df.iloc(因为我原来在这个岗位做。)