pandas 在熊猫数据框中按多个时间单位分组

Question

提问by metakermit

I have a data frame that consists of a time series data with 15-second intervals:

我有一个由时间序列数据组成的数据框，间隔为 15 秒：

date_time             value    
2012-12-28 11:11:00   103.2
2012-12-28 11:11:15   103.1
2012-12-28 11:11:30   103.4
2012-12-28 11:11:45   103.5
2012-12-28 11:12:00   103.3

The data spans many years. I would like to group by both year and time to look at the distribution of time-of-day effect over many years. For example, I may want to compute the mean and standard deviation of every 15-second interval across days, and look at how the means and standard deviations change from 2010, 2011, 2012, etc. I naively tried data.groupby(lambda x: [x.year, x.time])but it didn't work. How can I do such grouping?

数据跨越多年。我想按年份和时间分组，看看多年来时间效应的分布。例如，我可能想计算几天内每 15 秒间隔的均值和标准差，并查看均值和标准差从 2010 年、2011 年、2012 年等开始的变化。我天真地尝试过，data.groupby(lambda x: [x.year, x.time])但没有奏效. 我怎样才能进行这样的分组？

Answer 1

回答by metakermit

In case date_timeis not your index, a date_time-indexed DataFrame could be created with:

如果date_time不是您的索引，date_time可以使用以下命令创建 -indexed DataFrame：

dfts = df.set_index('date_time')

From there you can group by intervals using

从那里你可以使用

dfts.groupby(lambda x : x.month).mean()

to see mean values for each month. Similarly, you can do

查看每个月的平均值。同样，你可以做

dfts.groupby(lambda x : x.year).std()

for standard deviations across the years.

跨年的标准差。

If I understood the example task you would like to achieve, you could simply split the data into years using xs, group them and concatenate the results and store this in a new DataFrame.

如果我理解您想要实现的示例任务，您可以简单地使用将数据拆分为年份xs，将它们分组并连接结果并将其存储在一个新的DataFrame.

years = range(2012, 2015)
yearly_month_stats = [dfts.xs(str(year)).groupby(lambda x : x.month).mean() for year in years]
df2 = pd.concat(yearly_month_stats, axis=1, keys = years)

From which you get something like

从中你得到类似的东西

        2012       2013       2014
       value      value      value
1        NaN   5.324165  15.747767
2        NaN -23.193429   9.193217
3        NaN -14.144287  23.896030
4        NaN -21.877975  16.310195
5        NaN  -3.079910  -6.093905
6        NaN  -2.106847 -23.253183
7        NaN  10.644636   6.542562
8        NaN  -9.763087  14.335956
9        NaN  -3.529646   2.607973
10       NaN -18.633832   0.083575
11       NaN  10.297902  14.059286
12  33.95442  13.692435  22.293245

Answer 2

回答by joeb1415

You were close:

你很接近：

data.groupby([lambda x: x.year, lambda x: x.time])

Also be sure to set date_timeas the index, as in kermit666's answer

也一定要设置date_time为索引，如kermit666的答案

pandas 在熊猫数据框中按多个时间单位分组

提问by metakermit

回答by metakermit

回答by joeb1415

相关推荐

最近更新

标签

pandas 在熊猫数据框中按多个时间单位分组

提问by metakermit

回答by metakermit

回答by joeb1415

相关推荐

pandas read_csv 中的转义引号

pandas 熊猫的转换不起作用对 groupby 输出进行排序

pandas 使用python清理大数据

pandas 在熊猫中运行总和（无循环）

相关推荐

最近更新

标签