Python Pandas Dataframe 按组中的最大值选择行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32459325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Dataframe select row by max value in group
提问by user636322
I have a dataframe which was created via a df.pivot:
我有一个通过 df.pivot 创建的数据框:
type start end
F_Type to_date
A 20150908143000 345 316
B 20150908140300 NaN 480
20150908140600 NaN 120
20150908143000 10743 8803
C 20150908140100 NaN 1715
20150908140200 NaN 1062
20150908141000 NaN 145
20150908141500 418 NaN
20150908141800 NaN 450
20150908142900 1973 1499
20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
I would like to filter and return a single row for each 'F_TYPE' only returning the row with the Maximum 'to_date'. I would like to return the following dataframe:
我想为每个“F_TYPE”过滤并返回一行,只返回最大“to_date”的行。我想返回以下数据框:
type start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
Thanks..
谢谢..
采纳答案by unutbu
A standard approach is to use groupby(keys)[column].idxmax()
.
However, to select the desired rows using idxmax
you need idxmax
to return unique index values. One way to obtain a unique index is to call reset_index
.
标准方法是使用groupby(keys)[column].idxmax()
. 但是,要使用idxmax
您选择所需的行,您需要idxmax
返回唯一的索引值。获取唯一索引的一种方法是调用reset_index
.
Once you obtain the index values from groupby(keys)[column].idxmax()
you can then select the entire row using df.loc
:
从您获得索引值后,groupby(keys)[column].idxmax()
您可以使用df.loc
以下命令选择整行:
In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]:
start end
F_Type to_date
A 20150908143000 345 316
B 20150908143000 10743 8803
C 20150908143000 19522 16659
D 20150908143000 433 65
E 20150908143000 7290 7375
F 20150908143000 0 0
G 20150908143000 1796 340
Note: idxmax
returns index labels, not necessarily ordinals. After using reset_index
the index labels happen to also be ordinals, but since idxmax
is returning labels (not ordinals) it is better to alwaysuse idxmax
in conjunction with df.loc
, not df.iloc
(as I originally did in this post.)
注意:idxmax
返回索引标签,不一定是序数。使用后reset_index
的指数标签碰巧也是序,但由于idxmax
正在恢复标签(不是序号),最好是始终使用idxmax
与配合df.loc
,而不是df.iloc
(因为我原来在这个岗位做。)