Python 将日期添加到数据框中的日期

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16385785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:27:27  来源:igfitidea点击:

Add days to dates in dataframe

pythonpandasdatetime

提问by BigHandsome

I am stymied at the moment. I am sure that I am missing something simple, but how do you move a series of dates forward by x units? In my more specific case I want to add 180 days to a date series within a dataframe.

我现在受阻了。我确定我遗漏了一些简单的东西,但是如何将一系列日期向前移动 x 个单位?在我更具体的情况下,我想向数据框中的日期系列添加 180 天。

Here is what I have so far:

这是我到目前为止所拥有的:

import pandas, numpy, StringIO, datetime


txt = '''ID,DATE
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00
0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00
0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00
0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00
'''
df = pandas.read_csv(StringIO.StringIO(txt))
df = df.sort('DATE')
df.DATE = pandas.to_datetime(df.DATE)
df['X_DATE'] = df['DATE'].shift(180, freq=pandas.datetools.Day)

This code generates a type error. For reference I am using:

此代码生成类型错误。作为参考,我正在使用:

Python 2.7.4 Pandas '0.12.0.dev-6e7c4d6' Numpy '1.7.1'

Python 2.7.4 Pandas '0.12.0.dev-6e7c4d6' Numpy '1.7.1'

采纳答案by DSM

If I understand you, you don't actually want shift, you simply want to make a new column next to the existing DATEwhich is 180 days after. In that case, you can use timedelta:

如果我理解你,你实际上并不想要shift,你只是想在DATE180 天后的现有列旁边创建一个新列。在这种情况下,您可以使用timedelta

>>> from datetime import timedelta
>>> df.head()
                                 ID                DATE
8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00
0  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00
1  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00
5  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00
4  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00
>>> df["X_DATE"] = df["DATE"] + timedelta(days=180)
>>> df.head()
                                 ID                DATE              X_DATE
8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 2001-07-31 00:00:00
0  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 2004-02-09 00:00:00
1  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 2004-02-09 00:00:00
5  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 2006-09-05 00:00:00
4  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 2006-09-05 00:00:00

Does that help any?

这有帮助吗?

回答by dreyco676

For future readers if you want to change different rows by different amounts you will need to use Pandas TimedeltaIndex instead to pass a series of timedeltas.

对于未来的读者,如果您想以不同的数量更改不同的行,您将需要使用 Pandas TimedeltaIndex 来传递一系列 timedeltas。

For example I might want to shift my data to the nearest report period and each record could have started on a different day of the week.

例如,我可能希望将我的数据转移到最近的报告期,并且每条记录可能在一周中的不同日期开始。

import pandas as pd
days_to_shift = pd.TimedeltaIndex(6 - launch_df['launch_dt'].dt.dayofweek)
launch_df['launch_dt'] = launch_df['launch_dt'] + days_to_shift

回答by Zero

You could use pd.DateOffset. Which seems to be faster than timedelta.

你可以使用pd.DateOffset. 这似乎比timedelta.

In [930]: df['x_DATE'] = df['DATE'] + pd.DateOffset(days=180)

In [931]: df
Out[931]:
                                 ID       DATE     x_DATE
8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 2001-07-31
0  002691c9cec109e64558848f1358ac16 2003-08-13 2004-02-09
1  002691c9cec109e64558848f1358ac16 2003-08-13 2004-02-09
4  00d34668025906d55ae2e529615f530a 2006-03-09 2006-09-05
5  00d34668025906d55ae2e529615f530a 2006-03-09 2006-09-05
2  0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 2006-11-03
3  0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 2006-11-30
6  0101d3286dfbd58642a7527ecbddb92e 2007-10-13 2008-04-10
7  0101d3286dfbd58642a7527ecbddb92e 2007-10-27 2008-04-24
9  0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 2008-07-18


Timings

时间安排

Medium

中等的

In [948]: df.shape
Out[948]: (10000, 3)

In [950]: %timeit df['DATE'] + pd.DateOffset(days=180)
1000 loops, best of 3: 1.51 ms per loop

In [949]: %timeit df['DATE'] + timedelta(days=180)
100 loops, best of 3: 2.71 ms per loop

Large

大的

In [952]: df.shape
Out[952]: (100000, 3)

In [953]: %timeit df['DATE'] + pd.DateOffset(days=180)
100 loops, best of 3: 4.16 ms per loop

In [955]: %timeit df['DATE'] + timedelta(days=180)
10 loops, best of 3: 20 ms per loop