Python Pandas 在分组和聚合后排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42252273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:59:23  来源:igfitidea点击:

Python Pandas sorting after groupby and aggregate

pythonsortingpandasgrouping

提问by Tomas Rasymas

I am trying to sort data (Pandas) after grouping and aggregating and I am stuck. My data:

我试图在分组和聚合后对数据 (Pandas) 进行排序,但我被卡住了。我的数据:

data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
    'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'],
    'out_days': [11, 8, 10, 15, 11, 6, 10, 4]}
persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"])

days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]})

print(days_off_yearly)

After that I have my data sorted:

之后,我对我的数据进行了排序:

                out_days
                     sum
from_year name          
2010      John        17
2011      John        15
          John1       18
2012      John        10
          John4       11
          John6        4

I want to sort my data by from_year and out_days sum and expecting data to be:

我想按 from_year 和 out_days 总和对我的数据进行排序,并期望数据为:

                out_days
                     sum
from_year name          
2012      John4       11
          John        10
          John6        4    
2011      John1       18
          John        15
2010      John        17

I am trying

我在尝试

print(days_off_yearly.sort_values(["from_year", ("out_days", "sum")], ascending=False).head(10))

But getting KeyError: 'from_year'.

但是得到 KeyError: 'from_year'。

Any help appreciated.

任何帮助表示赞赏。

采纳答案by jezrael

You can use sort_values, but first reset_indexand then set_index:

您可以使用sort_values, 但首先reset_index然后set_index

#simplier aggregation
days_off_yearly = persons.groupby(["from_year", "name"])['out_days'].sum()
print(days_off_yearly)
from_year  name 
2010       John     17
2011       John     15
           John1    18
2012       John     10
           John4    11
           John6     4
Name: out_days, dtype: int64

print (days_off_yearly.reset_index()
                      .sort_values(['from_year','out_days'],ascending=False)
                      .set_index(['from_year','name']))
                 out_days
from_year name           
2012      John4        11
          John         10
          John6         4
2011      John1        18
          John         15
2010      John         17