pandas 在熊猫数据框中按多个时间单位分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14301004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by multiple time units in pandas data frame
提问by metakermit
I have a data frame that consists of a time series data with 15-second intervals:
我有一个由时间序列数据组成的数据框,间隔为 15 秒:
date_time value
2012-12-28 11:11:00 103.2
2012-12-28 11:11:15 103.1
2012-12-28 11:11:30 103.4
2012-12-28 11:11:45 103.5
2012-12-28 11:12:00 103.3
The data spans many years. I would like to group by both year and time to look at the distribution of time-of-day effect over many years. For example, I may want to compute the mean and standard deviation of every 15-second interval across days, and look at how the means and standard deviations change from 2010, 2011, 2012, etc. I naively tried data.groupby(lambda x: [x.year, x.time])but it didn't work. How can I do such grouping?
数据跨越多年。我想按年份和时间分组,看看多年来时间效应的分布。例如,我可能想计算几天内每 15 秒间隔的均值和标准差,并查看均值和标准差从 2010 年、2011 年、2012 年等开始的变化。我天真地尝试过,data.groupby(lambda x: [x.year, x.time])但没有奏效. 我怎样才能进行这样的分组?
回答by metakermit
In case date_timeis not your index, a date_time-indexed DataFrame could be created with:
如果date_time不是您的索引,date_time可以使用以下命令创建 -indexed DataFrame:
dfts = df.set_index('date_time')
From there you can group by intervals using
从那里你可以使用
dfts.groupby(lambda x : x.month).mean()
to see mean values for each month. Similarly, you can do
查看每个月的平均值。同样,你可以做
dfts.groupby(lambda x : x.year).std()
for standard deviations across the years.
跨年的标准差。
If I understood the example task you would like to achieve, you could simply split the data into years using xs, group them and concatenate the results and store this in a new DataFrame.
如果我理解您想要实现的示例任务,您可以简单地使用 将数据拆分为年份xs,将它们分组并连接结果并将其存储在一个新的DataFrame.
years = range(2012, 2015)
yearly_month_stats = [dfts.xs(str(year)).groupby(lambda x : x.month).mean() for year in years]
df2 = pd.concat(yearly_month_stats, axis=1, keys = years)
From which you get something like
从中你得到类似的东西
2012 2013 2014
value value value
1 NaN 5.324165 15.747767
2 NaN -23.193429 9.193217
3 NaN -14.144287 23.896030
4 NaN -21.877975 16.310195
5 NaN -3.079910 -6.093905
6 NaN -2.106847 -23.253183
7 NaN 10.644636 6.542562
8 NaN -9.763087 14.335956
9 NaN -3.529646 2.607973
10 NaN -18.633832 0.083575
11 NaN 10.297902 14.059286
12 33.95442 13.692435 22.293245

