Python 在熊猫“DataFrame”中将“TimeStamp”列截断为小时精度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28773342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:43:56  来源:igfitidea点击:

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

pythonpandasdatetimedataframe

提问by Jon Clements

I have a pandas.DataFramecalled dfwhich has an automatically generated index, with a column dt:

我有一个pandas.DataFrame调用df,它有一个自动生成的索引,有一列dt

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

我想做的是创建一个截断为小时精度的新列。我目前正在使用:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsetsor creating a DatetimeIndexor similar.

这有效,所以没问题。但是,我暗示有一些很好的方法可以使用pandas.tseries.offsets或创建 aDatetimeIndex或类似的东西。

So if possible, is there some pandaswizardry to do this?

所以如果可能的话,是否有一些pandas魔法可以做到这一点?

采纳答案by Alex Riley

In pandas 0.18.0 and later, there are datetime floor, ceiland roundmethods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

在熊猫和0.18.0以后,有日期时间floorceilround方法,以圆时间戳给定的固定精度/频率。要四舍五入到小时精度,您可以使用:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00


Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

这是截断时间戳的另一种替代方法。与 不同floor,它支持截断到精度,例如年或月。

You can temporarily adjust the precision unit of the underlying NumPy datetime64datatype, changing it from [ns]to [h]:

您可以临时调整底层 NumPydatetime64数据类型的精度单位,将其从 更改[ns][h]

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

这会将所有内容截断为小时精度。例如:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on:

相同的方法应该适用于任何其他单位:months 'M'、minutes'm'等等:

  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'
  • 跟上年份: '<M8[Y]'
  • 保持一个月: '<M8[M]'
  • 保持最新: '<M8[D]'
  • 保持最新: '<M8[m]'
  • 跟上第二: '<M8[s]'

回答by David Hagan

A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

我过去用来实现这个目标的方法如下(与您已经在做的非常相似,但我认为无论如何我都会把它扔掉):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))