Pandas 按工作日分组 (M/T/W/T/F/S/S)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47864691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:55:17  来源:igfitidea点击:

Pandas group by weekday (M/T/W/T/F/S/S)

pythonpandaspandas-groupby

提问by mannaroth

I have a pandas dataframe containing a time series (as index) of the form YYYY-MM-DD ('arrival_date') and I'd like to group by each of the weekdays (Monday to Sunday) in order to calculate for the other columns the mean, median, std etc. I should have in the end only seven rows and so far I've only found out how to group by week, which aggregates everything weekly.

我有一个包含 YYYY-MM-DD ('arrival_date') 形式的时间序列(作为索引)的Pandas数据框,我想按每个工作日(周一至周日)进行分组,以便计算另一个列平均值、中位数、标准等。我最后应该只有七行,到目前为止我只发现了如何按周分组,每周汇总所有内容。

# Reading the data
df_data = pd.read_csv('data.csv', delimiter=',')

# Providing the correct format for the data
df_data = pd.to_datetime(df_data['arrival_date'], format='%Y%m%d')

# Converting the time series column to index
df_data.index = pd.to_datetime(df_data['arrival_date'], unit='d')

# Grouping by week (= ~52 rows per year)
week_df = df_data.resample('W').mean()

Is there a simple way to achieve my goal in pandas? I was thinking to choose every other 7th element and perform operations on the resulting array, but that seems unnecessarily complex.

有没有一种简单的方法可以在Pandas中实现我的目标?我想选择每隔 7 个元素并对结果数组执行操作,但这似乎不必要地复杂。

The head of the data frame looks like this

数据框的头部看起来像这样

       arrival_date    price 1    price_2         price_3       price_4
2       20170816      75.945298  1309.715056     71.510215      22.721958
3       20170817      68.803269  1498.639663     64.675232      22.759137
4       20170818      73.497144  1285.122022     65.620260      24.381532
5       20170819      78.556828  1377.318509     74.028607      26.882429
6       20170820      57.092189  1239.530625     51.942213      22.056378
7       20170821      76.278975  1493.385548     74.801641      27.471604
8       20170822      79.006604  1241.603185     75.360606      28.250994
9       20170823      76.097351  1243.586084     73.459963      24.500618
10      20170824      64.860259  1231.325899     63.205554      25.015120
11      20170825      70.407325   975.091107     64.180692      27.177654
12      20170826      87.742284  1351.306100     79.049023      27.860549
13      20170827      58.014005  1208.424489     51.963388      21.049374
14      20170828      65.774114  1289.341335     59.922912      24.481232

回答by jezrael

I believe you need first parameter parse_datesin read_csvfor parse column to datetime and then groupbyby weekday_nameand aggregate:

我相信你需要第一个参数parse_datesread_csv的解析列于日期时间,然后groupby通过weekday_name和汇总:

df_data = pd.read_csv('data.csv', parse_dates=['arrival_date'])

week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean()
print (week_df)
                price_1      price_2    price_3    price_4
arrival_date                                              
Friday        71.952235  1130.106565  64.900476  25.779593
Monday        71.026544  1391.363442  67.362277  25.976418
Saturday      83.149556  1364.312304  76.538815  27.371489
Sunday        57.553097  1223.977557  51.952801  21.552876
Thursday      66.831764  1364.982781  63.940393  23.887128
Tuesday       79.006604  1241.603185  75.360606  28.250994
Wednesday     76.021324  1276.650570  72.485089  23.611288

For numeric index use weekday:

对于数字索引使用weekday

week_df = df_data.groupby(df_data['arrival_date'].dt.weekday).mean()
print (week_df)
                price_1      price_2    price_3    price_4
arrival_date                                              
0             71.026544  1391.363442  67.362277  25.976418
1             79.006604  1241.603185  75.360606  28.250994
2             76.021324  1276.650570  72.485089  23.611288
3             66.831764  1364.982781  63.940393  23.887128
4             71.952235  1130.106565  64.900476  25.779593
5             83.149556  1364.312304  76.538815  27.371489
6             57.553097  1223.977557  51.952801  21.552876

EDIT:

编辑:

For correct ordering add reindex:

对于正确的订购添加reindex

days = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday']
week_df = df_data.groupby(df_data['arrival_date'].dt.weekday_name).mean().reindex(days)
print (week_df)
                price_1      price_2    price_3    price_4
arrival_date                                              
Monday        71.026544  1391.363442  67.362277  25.976418
Tuesday       79.006604  1241.603185  75.360606  28.250994
Wednesday     76.021324  1276.650570  72.485089  23.611288
Thursday      66.831764  1364.982781  63.940393  23.887128
Friday        71.952235  1130.106565  64.900476  25.779593
Saturday      83.149556  1364.312304  76.538815  27.371489
Sunday        57.553097  1223.977557  51.952801  21.552876