按特定顺序排序(情况:pandas DataFrame Groupby)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39275294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:56:17  来源:igfitidea点击:

Sort by certain order (Situation: pandas DataFrame Groupby)

pythonsortingpandas

提问by SUNDONG

I want to change the day of order presented by below code.
What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
- should I say, sort by key in certain predefined order?

我想更改以下代码显示的订单日期。
我想要的是顺序(周一,周二,周三,周四,周五,周六,周日)的结果
- 我应该说,按某些预定义的顺序按键排序吗?



Here is my code which needs some tweak:

这是我需要一些调整的代码:

f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum()

print(f8)

Current result:

当前结果:

device_id                         day
device_112                        Thu     436518
                                  Wed     636451
                                  Fri     770307
                                  Tue     792066
                                  Mon     826862
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Thu    2857429
                                  Tue    3303173
                                  Fri    3548178
                                  Wed    3822616
                                  Sun    4213633
                                  Sat    4475221

Desired result:

想要的结果:

device_id                         day
device_112                        Mon     826862  
                                  Tue     792066
                                  Wed     636451 
                                  Thu     436518
                                  Fri     770307
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Tue    3303173
                                  Wed    3822616
                                  Thu    2857429
                                  Fri    3548178
                                  Sat    4475221
                                  Sun    4213633


Here, type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'])is a class 'pandas.core.groupby.SeriesGroupBy'.

这里type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'])是一个类“pandas.core.groupby.SeriesGroupBy”。

I have found .sort_values(), but it is a built-in sort function by values.
I want to get some pointers to set some order to use it further data manipulation.
Thanks in advance.

我找到了.sort_values(),但它是一个内置的按值排序的函数。
我想得到一些指针来设置一些顺序以使用它进一步的数据操作。
提前致谢。

回答by PdevG

Took me some time, but I found the solution. reindexdoes what you want. See my code example:

花了我一些时间,但我找到了解决方案。reindex做你想要的。请参阅我的代码示例:

a = [1, 2] * 2 + [2, 1] * 3 + [1, 2]
b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3
c = list(range(12))
df = pd.DataFrame(data=[a,b,c]).T
df.columns = ['device', 'day', 'value']
df = df.groupby(['device', 'day']).sum()

gives:

给出:

            value
device day       
1      Fri      7
       Mon      0
       Thu     12
       Wed     14
2      Fri     14
       Mon     12
       Thu      6
       Wed      1

Then doing reindex:

然后进行重新索引:

df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day')

or more conveniently (credits to burhan)

或更方便(归功于 burhan)

df.reindex(list(calendar.day_abbr), level='day')

gives:

给出:

            value
device day       
1      Mon      0
       Wed     14
       Thu     12
       Fri      7
2      Mon     12
       Wed      1
       Thu      6
       Fri     14

回答by root

Set the 'day'column as categoricaldtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the groupbywill then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify.

'day'列设置为分类dtype,只需确保在设置类别时,您的天数列表按您希望的方式排序。执行groupby将自动为您排序,但如果您尝试对列进行排序,它将按照您指定的正确顺序进行排序。

# Initial setup.
np.random.seed([3,1415])
n = 100
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df = pd.DataFrame({
    'device_id': np.random.randint(1,3,n),
    'day': np.random.choice(days, n),
    'dwell_time':np.random.random(n)
    })


# Set as category, groupby, and sort.
df['day'] = df['day'].astype("category", categories=days, ordered=True)
df = df.groupby(['device_id', 'day']).sum()

Update: astype no longer accepts categories, use:

更新:astype 不再接受类别,请使用:

category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True)
df['day'] = df['day'].astype(category_day)

The resulting output:

结果输出:

               dwell_time
device_id day            
1         Mon    4.428626
          Tue    3.259319
          Wed    2.436024
          Thu    0.909724
          Fri    4.974137
          Sat    5.583778
          Sun    2.687258
2         Mon    3.117923
          Tue    2.427154
          Wed    1.943927
          Thu    4.599547
          Fri    2.628887
          Sat    6.247520
          Sun    2.716886

Note that this method works for any type of customized sorting. For example, if you had a column with entries 'a', 'b', 'c', and wanted it to be sorted in a non-standard order, e.g. 'c', 'a', 'b', you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.

请注意,此方法适用于任何类型的自定义排序。例如,如果您有一列包含条目'a', 'b', 'c',并希望以非标准顺序对其进行排序,例如'c', 'a', 'b',您只需执行相同类型的过程:将列指定为分类,您的类别在非标准中你想要的标准订单。

回答by ayhan

Probably not the best way, but as far as I know you cannot pass a function/mapping to sort_values. As a workaround, I generally use assignto add a new column and sort by that column. In your example, that also requires resetting the index first (and setting it back).

可能不是最好的方法,但据我所知你不能将函数/映射传递给sort_values. 作为一种解决方法,我通常使用assign添加一个新列并按该列排序。在您的示例中,这还需要先重置索引(然后再将其设置回来)。

days = {'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5, 'Sun': 6, 'Sat': 7}
f8 = f8.reset_index()
(f8.assign(day_num=f8['day'].map(days))
   .sort_values(['device_id', 'day_num'])
   .set_index(['device_id', 'day'])
   .drop('day_num', axis=1))
Out: 
                                            0
device_id                        day         
0d4fd55bb363bf6f6f7f8b3342cd0467 Mon   826862
                                 Tue   792066
                                 Wed   636451
                                 Thu   436518
                                 Fri   770307
                                 Sun  1019298
                                 Sat   953503
f6258edf9145d1c0404e6f3d7a27a29d Mon  2534895
                                 Tue  3303173
                                 Wed  3822616
                                 Thu  2857429
                                 Fri  3548178
                                 Sun  4213633
                                 Sat  4475221

回答by JCVanHamme

If you sort the dataframe prior to the groupby, pandas will maintain the order of your sort. First thing you'll have to do is come up with a good way to sort the days of the week. One way of doing that is to assign an int representing the day of the week to each row, then sort on that column. For example:

如果您在 之前对数据框进行排序groupby,pandas 将保持您的排序顺序。您必须做的第一件事是想出一种对一周中的几天进行排序的好方法。一种方法是为每一行分配一个代表星期几的 int,然后对该列进行排序。例如:

import pandas

df = pandas.DataFrame(
    columns=['device_id', 'day', 'dwell_time'], 
    data=[[1, 'Wed', 35], [1, 'Mon', 63], [2, 'Sat', 83], [2, 'Fri', 82]]
)

df['day_of_week'] = df.apply(
    lambda x: ['Mon', 'Tues', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'].index(x.day), 
    1
)

print(df.sort(['device_id', 'day_of_week']).groupby(['device_id', 'day'])['dwell_time'].sum())

yields:

产量:

device_id  day    dwell_time
1          Mon    63
           Wed    35
2          Fri    82
           Sat    83