Python 在熊猫“DataFrame”中将“TimeStamp”列截断为小时精度

Question

提问by Jon Clements

I have a pandas.DataFramecalled dfwhich has an automatically generated index, with a column dt:

我有一个pandas.DataFrame调用df，它有一个自动生成的索引，有一列dt：

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

我想做的是创建一个截断为小时精度的新列。我目前正在使用：

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsetsor creating a DatetimeIndexor similar.

这有效，所以没问题。但是，我暗示有一些很好的方法可以使用pandas.tseries.offsets或创建 aDatetimeIndex或类似的东西。

So if possible, is there some pandaswizardry to do this?

所以如果可能的话，是否有一些pandas魔法可以做到这一点？

Answer 1

采纳答案by Alex Riley

In pandas 0.18.0 and later, there are datetime floor, ceiland roundmethods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

在熊猫和0.18.0以后，有日期时间floor，ceil和round方法，以圆时间戳给定的固定精度/频率。要四舍五入到小时精度，您可以使用：

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

这是截断时间戳的另一种替代方法。与不同floor，它支持截断到精度，例如年或月。

You can temporarily adjust the precision unit of the underlying NumPy datetime64datatype, changing it from [ns]to [h]:

您可以临时调整底层 NumPydatetime64数据类型的精度单位，将其从更改[ns]为[h]：

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

这会将所有内容截断为小时精度。例如：

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on:

相同的方法应该适用于任何其他单位：months 'M'、minutes'm'等等：

Keep up to year: '<M8[Y]'
Keep up to month: '<M8[M]'
Keep up to day: '<M8[D]'
Keep up to minute: '<M8[m]'
Keep up to second: '<M8[s]'

跟上年份： '<M8[Y]'
保持一个月： '<M8[M]'
保持最新： '<M8[D]'
保持最新： '<M8[m]'
跟上第二： '<M8[s]'

Answer 2

回答by David Hagan

A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

我过去用来实现这个目标的方法如下（与您已经在做的非常相似，但我认为无论如何我都会把它扔掉）：

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))

Python 在熊猫“DataFrame”中将“TimeStamp”列截断为小时精度

提问by Jon Clements

采纳答案by Alex Riley

回答by David Hagan

相关推荐

最近更新

标签

Python 在熊猫“DataFrame”中将“TimeStamp”列截断为小时精度

提问by Jon Clements

采纳答案by Alex Riley

回答by David Hagan

相关推荐

切换到 Python 3 导致 UnicodeDecodeError

Python pytesseract-没有这样的文件或目录错误

在函数的代码对象上使用 Python exec 时如何获取返回值？

在 Python 3.5 中导入 bs4

相关推荐

最近更新

标签