pandas 从python中的groupby对象中选择特定行

Question

提问by Shiva Prakash

id    marks  year 
1     18      2013
1     25      2012
3     16      2014
2     16      2013
1     19      2013
3     25      2013
2     18      2014

suppose now I group the above on id by python command.
grouped = file.groupby(file.id)

假设现在我通过 python 命令将上述内容分组到 id 上。
分组 = file.groupby(file.id)

I would like to get a new file with only the row in each group with recent year that is highest of all the year in the group.

我想获得一个新文件，其中每个组中只有最近一年中最高的那一行。

Please let me know the command, I am trying with apply but it ll only given the boolean expression. I want the entire row with latest year.

请让我知道命令，我正在尝试使用 apply 但它只会给出布尔表达式。我想要最近一年的整行。

Answer 1

回答by EdChum

I cobbled this together using this: Python : Getting the Row which has the max value in groups using groupby

我用这个拼凑起来：Python : 使用 groupby 获取在组中具有最大值的行

So basically we can groupby the 'id' column, then call transformon the 'year' column and create a boolean index where the year matches the max year value for each 'id':

所以基本上我们可以按 'id' 列分组，然后调用transform'year' 列并创建一个布尔索引，其中年份与每个 'id' 的最大年份值匹配：

In [103]:

df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
   id  marks  year
0   1     18  2013
2   3     16  2014
4   1     19  2013
6   2     18  2014

pandas 从python中的groupby对象中选择特定行

提问by Shiva Prakash

回答by EdChum

相关推荐

最近更新

标签

pandas 从python中的groupby对象中选择特定行

提问by Shiva Prakash

回答by EdChum

相关推荐

pandas.hashtable.PyObjectHashTable.get_item 中的 Python 熊猫 groupby 键错误

按时间索引时，将 Pandas 数据帧拆分为训练集和测试集

pandas - 如何仅将 DataFrame 的选定列保存到 HDF5

Python：Pandas - 基于列值分离数据框

相关推荐

最近更新

标签