Python Pandas:将日期时间列分组为小时和分钟聚合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16266019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:09:49  来源:igfitidea点击:

Python Pandas: Group datetime column into hour and minute aggregations

pythondatepandas

提问by horatio1701d

This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.

这似乎是相当直接的,但在将近一整天之后我还没有找到解决方案。我已经用 read_csv 加载了我的数据框,并且很容易地将日期和时间列解析、组合和索引到一列中,但现在我希望能够根据小时和分钟分组来重塑和执行计算,类似于你可以做的excel 支点。

I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.

我知道如何重新采样到小时或分钟,但它保留了与每个小时/分钟相关联的日期部分,而我只想将数据集聚合到小时和分钟,类似于在 excel 数据透视表中分组并选择“小时”和“分钟”但是不选择其他任何东西。

Any help would be greatly appreciated.

任何帮助将不胜感激。

回答by Wes McKinney

Can't you do, where dfis your DataFrame:

你不能做,df你的DataFrame在哪里:

times = pd.to_datetime(df.timestamp_col)
df.groupby([times.hour, times.minute]).value_col.sum()

回答by WillZ

Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandasover time.

当我搜索这种类型的 groupby 时遇到了这个。Wes 上面的代码对我不起作用,不确定是不是因为pandas随着时间的推移而发生变化。

In pandas 0.16.2, what I did in the end was:

pandas 0.16.2,我最后做的是:

grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()

You'd have (hour, minute) tuples as the grouped index. If you want multi-index:

你有 (hour, minute) 元组作为分组索引。如果你想要多索引:

grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
                       data.datetime_col.map(lambda x : x.minute)])

回答by Nix G-D

Wes' code didn't work for me. But the DatetimeIndex function (docs) did:

Wes 的代码对我不起作用。但是 DatetimeIndex 函数(docs)做了:

times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])

The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.

DatetimeIndex 对象是 Pandas 中时间的表示。第一行创建一个日期时间数组。第二行使用此数组获取所有行的小时和分钟数据,允许按这些值对数据进行分组 ( docs)。

回答by tsando

I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:

我有上面的 Wes & Nix 答案的替代方案,只需一行代码,假设您的列已经是日期时间列,您不需要分别获取小时和分钟属性:

df.groupby(df.timestamp_col.dt.time).value_col.sum()