pandas 从python中的groupby对象中选择特定行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28175330/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:52:58  来源:igfitidea点击:

selecting a particular row from groupby object in python

pythonpandasgroup-by

提问by Shiva Prakash

id    marks  year 
1     18      2013
1     25      2012
3     16      2014
2     16      2013
1     19      2013
3     25      2013
2     18      2014

suppose now I group the above on id by python command.
grouped = file.groupby(file.id)

假设现在我通过 python 命令将上述内容分组到 id 上。
分组 = file.groupby(file.id)

I would like to get a new file with only the row in each group with recent year that is highest of all the year in the group.

我想获得一个新文件,其中每个组中只有最近一年中最高的那一行。

Please let me know the command, I am trying with apply but it ll only given the boolean expression. I want the entire row with latest year.

请让我知道命令,我正在尝试使用 apply 但它只会给出布尔表达式。我想要最近一年的整行。

回答by EdChum

I cobbled this together using this: Python : Getting the Row which has the max value in groups using groupby

我用这个拼凑起来:Python : 使用 groupby 获取在组中具有最大值的行

So basically we can groupby the 'id' column, then call transformon the 'year' column and create a boolean index where the year matches the max year value for each 'id':

所以基本上我们可以按 'id' 列分组,然后调用transform'year' 列并创建一个布尔索引,其中年份与每个 'id' 的最大年份值匹配:

In [103]:

df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
   id  marks  year
0   1     18  2013
2   3     16  2014
4   1     19  2013
6   2     18  2014