Python Pandas Dataframe 按组中的最大值选择行

Question

提问by user636322

I have a dataframe which was created via a df.pivot:

我有一个通过 df.pivot 创建的数据框：

type                             start  end
F_Type         to_date                     
A              20150908143000    345    316
B              20150908140300    NaN    480
               20150908140600    NaN    120
               20150908143000  10743   8803
C              20150908140100    NaN   1715
               20150908140200    NaN   1062
               20150908141000    NaN    145
               20150908141500    418    NaN
               20150908141800    NaN    450
               20150908142900   1973   1499
               20150908143000  19522  16659
D              20150908143000    433     65
E              20150908143000   7290   7375
F              20150908143000      0      0
G              20150908143000   1796    340

I would like to filter and return a single row for each 'F_TYPE' only returning the row with the Maximum 'to_date'. I would like to return the following dataframe:

我想为每个“F_TYPE”过滤并返回一行，只返回最大“to_date”的行。我想返回以下数据框：

type                             start  end
F_Type         to_date                     
A              20150908143000    345    316
B              20150908143000  10743   8803
C              20150908143000  19522  16659
D              20150908143000    433     65
E              20150908143000   7290   7375
F              20150908143000      0      0
G              20150908143000   1796    340

Thanks..

谢谢..

Answer 1

采纳答案by unutbu

A standard approach is to use groupby(keys)[column].idxmax(). However, to select the desired rows using idxmaxyou need idxmaxto return unique index values. One way to obtain a unique index is to call reset_index.

标准方法是使用groupby(keys)[column].idxmax(). 但是，要使用idxmax您选择所需的行，您需要idxmax返回唯一的索引值。获取唯一索引的一种方法是调用reset_index.

Once you obtain the index values from groupby(keys)[column].idxmax()you can then select the entire row using df.loc:

从您获得索引值后，groupby(keys)[column].idxmax()您可以使用df.loc以下命令选择整行：

In [20]: df.loc[df.reset_index().groupby(['F_Type'])['to_date'].idxmax()]
Out[20]: 
                       start    end
F_Type to_date                     
A      20150908143000    345    316
B      20150908143000  10743   8803
C      20150908143000  19522  16659
D      20150908143000    433     65
E      20150908143000   7290   7375
F      20150908143000      0      0
G      20150908143000   1796    340

Note: idxmaxreturns index labels, not necessarily ordinals. After using reset_indexthe index labels happen to also be ordinals, but since idxmaxis returning labels (not ordinals) it is better to alwaysuse idxmaxin conjunction with df.loc, not df.iloc(as I originally did in this post.)

注意：idxmax返回索引标签，不一定是序数。使用后reset_index的指数标签碰巧也是序，但由于idxmax正在恢复标签（不是序号），最好是始终使用idxmax与配合df.loc，而不是df.iloc（因为我原来在这个岗位做。）

Python Pandas Dataframe 按组中的最大值选择行

提问by user636322

采纳答案by unutbu

相关推荐

最近更新

标签

Python Pandas Dataframe 按组中的最大值选择行

提问by user636322

采纳答案by unutbu

相关推荐

Python Django：访问给定字段的选择元组

Python 导入错误：没有名为“版本”的模块

Python 从变量打印原始字符串？（没有得到答案）

在 Python DataFrame 中拆分字符串

相关推荐

最近更新

标签