pandas python pandas按一天中的小时求和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43585988/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas sum by hour of day
提问by julia_3010
I'm working with the following dataset with hourly counts (df): The datframe has 8784 rows (for the year 2016, hourly).
我正在使用以下每小时计数 (df) 的数据集:数据框有 8784 行(2016 年,每小时)。
I'd like to see if there are daily trends (e.g if there is an increase in the morning hours. For this i'd like to create a plot that has the hour of the day (from 0 to 24) on the x-axis and number of cyclists on the y axis (something like in the picture below from http://ofdataandscience.blogspot.co.uk/2013/03/capital-bikeshare-time-series-clustering.html).
我想看看是否有每日趋势(例如,早上的时间是否增加。为此,我想创建一个在 x- 上包含一天中的小时(从 0 到 24)的图- y 轴上的轴和骑自行车者的数量(类似于下图来自http://ofdataandscience.blogspot.co.uk/2013/03/capital-bikeshare-time-series-clustering.html)。
I experimented with differet ways of pivot
, resample
and set_index
and plotting it with matplotlib, without success. In other words, i couldn't find a way to sum up every observation at a certain hour and then plot those for each weekday
我尝试了 , 的不同方式pivot
,resample
并set_index
用 matplotlib 绘制它,但没有成功。换句话说,我找不到一种方法来总结特定时间的每个观察结果,然后在每个工作日绘制它们
Any ideas how to do this? Thanks in advance!
任何想法如何做到这一点?提前致谢!
回答by jezrael
I think you can use groupby
by hour
and weekday
and aggregate sum
(or maybe mean
), last reshape by unstack
and DataFrame.plot
:
我认为你可以使用groupby
by hour
andweekday
和聚合sum
(或者也许mean
),最后通过unstack
and重塑DataFrame.plot
:
df = df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()
Solution with pivot_table
:
解决方案pivot_table
:
df1 = df.pivot_table(index=df['Date'].dt.hour,
columns='weekday',
values='Cyclists',
aggfunc='sum').plot()
Sample:
样本:
N = 200
np.random.seed(100)
rng = pd.date_range('2016-01-01', periods=N, freq='H')
df = pd.DataFrame({'Date': rng, 'Cyclists': np.random.randint(100, size=N)})
df['weekday'] = df['Date'].dt.weekday_name
print (df.head())
Cyclists Date weekday
0 8 2016-01-01 00:00:00 Friday
1 24 2016-01-01 01:00:00 Friday
2 67 2016-01-01 02:00:00 Friday
3 87 2016-01-01 03:00:00 Friday
4 79 2016-01-01 04:00:00 Friday
print (df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack())
weekday Friday Monday Saturday Sunday Thursday Tuesday Wednesday
Date
0 102 91 120 53 95 86 21
1 102 83 100 27 20 94 25
2 121 53 105 56 10 98 54
3 164 78 54 30 8 42 6
4 163 0 43 48 89 84 37
5 49 13 150 47 72 95 58
6 24 57 32 39 30 76 39
7 127 76 128 38 12 33 94
8 72 3 59 44 18 58 51
9 138 70 67 18 93 42 30
10 77 3 7 64 92 22 66
11 159 84 49 56 44 0 24
12 156 79 47 34 57 55 55
13 42 10 65 53 0 98 17
14 116 87 61 74 73 19 45
15 106 60 14 17 54 53 89
16 22 3 55 72 92 68 45
17 154 48 71 13 66 62 35
18 60 52 80 30 16 50 16
19 79 43 2 17 5 68 12
20 11 36 94 53 51 35 86
21 180 5 19 68 90 23 82
22 103 71 98 50 34 9 67
23 92 38 63 91 67 48 92
df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()
EDIT:
编辑:
You can also convert wekkday
to categorical
for correct soting of columns by names of week:
你也可以转换wekkday
到categorical
通过一周的名字列的正确soting:
names = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday','Friday', 'Saturday', 'Sunday']
df['weekday'] = df['weekday'].astype('category', categories=names, ordered=True)
df.groupby([df['Date'].dt.hour, 'weekday'])['Cyclists'].sum().unstack().plot()