pandas 如何舍入熊猫`DatetimeIndex`?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13785932/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:31:37  来源:igfitidea点击:

How to round a Pandas `DatetimeIndex`?

datedatetimenumpypandasdate-format

提问by Yariv

I have a pandas.DatetimeIndex, e.g.:

我有一个pandas.DatetimeIndex,例如:

pd.date_range('2012-1-1 02:03:04.000',periods=3,freq='1ms')
>>> [2012-01-01 02:03:04, ..., 2012-01-01 02:03:04.002000]

I would like to round the dates (Timestamps) to the nearest second. How do I do that? The expected result is similar to:

我想将日期 ( Timestamps)舍入到最近的秒数。我怎么做?预期结果类似于:

[2012-01-01 02:03:04.000000, ..., 2012-01-01 02:03:04.000000]

Is it possible to accomplish this by rounding a Numpy datetime64[ns]to seconds without changing the dtype[ns]?

是否可以通过将 Numpy 舍入datetime64[ns]到秒而不更改dtype[ns]?

np.array(['2012-01-02 00:00:00.001'],dtype='datetime64[ns]')

回答by Andy Hayden

Update: if you're doing this to a DatetimeIndex / datetime64 column a better way is to use np.rounddirectly rather than via an apply/map:

更新:如果您对 DatetimeIndex / datetime64 列执行此操作,更好的方法是np.round直接使用而不是通过应用/映射:

np.round(dtindex_or_datetime_col.astype(np.int64), -9).astype('datetime64[ns]')

Old answer (with some more explanation):

旧答案(有更多解释):

Whilst @Matti's answer is clearly the correct way to deal with your situation, I thought I would add an answer how you might round a Timestamp to the nearest second:

虽然@Matti 的答案显然是处理您的情况的正确方法,但我想我会添加一个答案,您可以如何将时间戳舍入到最近的秒数:

from pandas.lib import Timestamp

t1 = Timestamp('2012-1-1 00:00:00')
t2 = Timestamp('2012-1-1 00:00:00.000333')

In [4]: t1
Out[4]: <Timestamp: 2012-01-01 00:00:00>

In [5]: t2
Out[5]: <Timestamp: 2012-01-01 00:00:00.000333>

In [6]: t2.microsecond
Out[6]: 333

In [7]: t1.value
Out[7]: 1325376000000000000L

In [8]: t2.value
Out[8]: 1325376000000333000L

# Alternatively: t2.value - t2.value % 1000000000
In [9]: long(round(t2.value, -9)) # round milli-, micro- and nano-seconds
Out[9]: 1325376000000000000L

In [10]: Timestamp(long(round(t2.value, -9)))
Out[10]: <Timestamp: 2012-01-01 00:00:00>

Hence you can apply this to the entire index:

因此,您可以将其应用于整个索引:

def to_the_second(ts):
    return Timestamp(long(round(ts.value, -9)))

dtindex.map(to_the_second)

回答by wombatonfire

round()method was added for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta in pandas 0.18.0. Now we can do the following:

round()在 pandas 0.18.0 中为 DatetimeIndex、Timestamp、TimedeltaIndex 和 Timedelta 添加了方法。现在我们可以执行以下操作:

In[114]: index = pd.DatetimeIndex([pd.Timestamp('2012-01-01 02:03:04.000'), pd.Timestamp('2012-01-01 02:03:04.002'), pd.Timestamp('20130712 02:03:04.500'), pd.Timestamp('2012-01-01 02:03:04.501')])

In[115]: index.values
Out[115]: 
array(['2012-01-01T02:03:04.000000000', '2012-01-01T02:03:04.002000000',
       '2013-07-12T02:03:04.500000000', '2012-01-01T02:03:04.501000000'], dtype='datetime64[ns]')

In[116]: index.round('S')
Out[116]: 
DatetimeIndex(['2012-01-01 02:03:04', '2012-01-01 02:03:04',
               '2013-07-12 02:03:04', '2012-01-01 02:03:05'],
              dtype='datetime64[ns]', freq=None)

round()accepts frequency parameter. String aliases for it are listed here.

round()接受频率参数。此处列出了它的字符串别名。

回答by Matti John

There is little point in changing the index itself - since you can just generate using date_rangewith the desired frequency parameter as in your question.

更改索引本身没有什么意义 - 因为您可以date_range像您的问题一样使用所需的频率参数生成 using 。

I assume what you are trying to do is change the frequency of a Time Series that contains data, in which case you can use resample(documentation). For example if you have the following time series:

我假设您要做的是更改包含数据的时间序列的频率,在这种情况下您可以使用resample文档)。例如,如果您有以下时间序列:

dt_index = pd.date_range('2012-1-1 00:00.001',periods=3, freq='1ms')
ts = pd.Series(randn(3), index=dt_index)


2012-01-01 00:00:00           0.594618
2012-01-01 00:00:00.001000    0.874552
2012-01-01 00:00:00.002000   -0.700076
Freq: L

Then you can change the frequency to seconds using resample, specifying how you want to aggregate the values (mean, sum etc.):

然后您可以使用 resample 将频率更改为秒,指定您希望如何聚合值(平均值、总和等):

ts.resample('S', how='sum')

2012-01-01 00:00:00    0.594618
2012-01-01 00:00:01    0.174475
Freq: S

回答by Daniel Golden

For more general rounding, you can make use of the fact that Pandas Timestampobjects mostly use the standard library datetime.datetimeAPI, including the datetime.datetime.replace()method.

对于更一般的舍入,您可以利用 PandasTimestamp对象主要使用标准库datetime.datetimeAPI(包括datetime.datetime.replace()方法)这一事实。

So, to solve your microsecond rounding problem, you could do:

因此,要解决您的微秒舍入问题,您可以执行以下操作:

import datetime
import pandas as pd

times = pd.date_range('2012-1-1 02:03:04.499',periods=3,freq='1ms')
# Add 5e5 microseconds and truncate to simulate rounding
times_rounded = [(x + datetime.timedelta(microseconds=5e5)).replace(microsecond=0) for x in times]

from IPython.display import display
print('Before:')
display(list(times))
print('After:')
display(list(times_rounded))

Output:

输出:

Before:
[Timestamp('2012-01-01 02:03:04.499000', offset='L'),
 Timestamp('2012-01-01 02:03:04.500000', offset='L'),
 Timestamp('2012-01-01 02:03:04.501000', offset='L')]
After:
[Timestamp('2012-01-01 02:03:04', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L')]

You can use the same technique to, e.g., round to the nearest day (as long as you're not concerned about leap seconds and the like):

您可以使用相同的技术,例如,四舍五入到最近的一天(只要您不关心闰秒等):

times = pd.date_range('2012-1-1 08:00:00', periods=3, freq='4H')
times_rounded = [(x + datetime.timedelta(hours=12)).replace(hour=0, second=0, microsecond=0) for x in times]

Inspired by this SO post: https://stackoverflow.com/a/19718411/1410871

受此 SO 帖子的启发:https: //stackoverflow.com/a/19718411/1410871