Python 在熊猫“DataFrame”中将“TimeStamp”列截断为小时精度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28773342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Truncate `TimeStamp` column to hour precision in pandas `DataFrame`
提问by Jon Clements
I have a pandas.DataFrame
called df
which has an automatically generated index, with a column dt
:
我有一个pandas.DataFrame
调用df
,它有一个自动生成的索引,有一列dt
:
df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))
What I'd like to do is create a new column truncated to hour precision. I'm currently using:
我想做的是创建一个截断为小时精度的新列。我目前正在使用:
df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))
This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets
or creating a DatetimeIndex
or similar.
这有效,所以没问题。但是,我暗示有一些很好的方法可以使用pandas.tseries.offsets
或创建 aDatetimeIndex
或类似的东西。
So if possible, is there some pandas
wizardry to do this?
所以如果可能的话,是否有一些pandas
魔法可以做到这一点?
采纳答案by Alex Riley
In pandas 0.18.0 and later, there are datetime floor
, ceil
and round
methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:
在熊猫和0.18.0以后,有日期时间floor
,ceil
和round
方法,以圆时间戳给定的固定精度/频率。要四舍五入到小时精度,您可以使用:
>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
Here's another alternative to truncate the timestamps. Unlike floor
, it supports truncating to a precision such as year or month.
这是截断时间戳的另一种替代方法。与 不同floor
,它支持截断到精度,例如年或月。
You can temporarily adjust the precision unit of the underlying NumPy datetime64
datatype, changing it from [ns]
to [h]
:
您可以临时调整底层 NumPydatetime64
数据类型的精度单位,将其从 更改[ns]
为[h]
:
df['dt'].values.astype('<M8[h]')
This truncates everything to hour precision. For example:
这会将所有内容截断为小时精度。例如:
>>> df
dt
0 2014-10-01 10:02:45
1 2014-10-01 13:08:17
2 2014-10-01 17:39:24
>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
>>> df.dtypes
dt datetime64[ns]
dt2 datetime64[ns]
The same method should work for any other unit: months 'M'
, minutes 'm'
, and so on:
相同的方法应该适用于任何其他单位:months 'M'
、minutes'm'
等等:
- Keep up to year:
'<M8[Y]'
- Keep up to month:
'<M8[M]'
- Keep up to day:
'<M8[D]'
- Keep up to minute:
'<M8[m]'
- Keep up to second:
'<M8[s]'
- 跟上年份:
'<M8[Y]'
- 保持一个月:
'<M8[M]'
- 保持最新:
'<M8[D]'
- 保持最新:
'<M8[m]'
- 跟上第二:
'<M8[s]'
回答by David Hagan
A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):
我过去用来实现这个目标的方法如下(与您已经在做的非常相似,但我认为无论如何我都会把它扔掉):
df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))