Pandas:将 timedelta 列添加到 datetime 列(矢量化)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38355816/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:34:20  来源:igfitidea点击:

Pandas: add timedelta column to datetime column (vectorized)

pandasvectorizationtimedelta

提问by flyingmeatball

I have a pandas dataframe with two columns, a date column and an int column, and I'd simply like to add the int column (in days) to the date column. I found a solution using df.apply(), but that was too slow on my full dataset. I don't see a ton of documentation on doing this in a vectorized manner (the closest I could find was this), so I wanted to make sure the solution I found was the best way to go forward.

我有一个包含两列的 Pandas 数据框,一个日期列和一个 int 列,我只想将 int 列(以天为单位)添加到日期列中。我找到了一个使用 df.apply() 的解决方案,但这在我的完整数据集上太慢了。我没有看到大量关于以矢量化方式执行此操作的文档(我能找到的最接近的是this),因此我想确保我找到的解决方案是前进的最佳方式。

My raw data is just a column of strings as a column of ints (days).

我的原始数据只是一列字符串作为一列整数(天)。

import pandas as pd
from datetime import timedelta

df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                  columns = ['ship_string','days_supply'])
print df  

ship_string  days_supply
0  2016-01-10           28
1  2016-05-11           28
2  2016-02-23           15
3  2015-12-08           30

My first thought (which worked) was to use .apply as follows:

我的第一个想法(有效)是使用 .apply 如下:

def f(x):    
    return x['ship_date'] + timedelta(days=x['days_supply'] )

df['ship_date'] = pd.to_datetime(df['ship_string'])

df['supply_ended'] = df.apply(f,axis = 1)

That worked, but is exceedingly slow. I've posted my alternate solution below as an answer to the question, but I'd like to get confirmation that it is "best practice". I couldn't find many good threads on adding timedelta columns to dates in pandas (especially in a vectorized manner), so thought I'd add one that is a little bit more user friendly and hopefully it will help the next poor soul trying to do this.

那行得通,但速度非常慢。我已经在下面发布了我的替代解决方案作为问题的答案,但我想确认这是“最佳实践”。我找不到很多关于将 timedelta 列添加到 Pandas 中的日期的好线程(尤其是以矢量化方式),所以我想我会添加一个对用户更友好的,希望它会帮助下一个尝试做这个。

回答by flyingmeatball

Full code solution:

完整代码解决方案:

import pandas as pd
from datetime import timedelta

df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                      columns = ['ship_string','days_supply'])

df['ship_date'] = pd.to_datetime(df['ship_string'])

df['time_added'] = pd.to_timedelta(df['days_supply'],'d')
df['supply_ended'] = df['ship_date'] + df['time_added']

print df

  ship_string  days_supply  ship_date  time_added supply_ended
0  2016-01-10           28 2016-01-10     28 days   2016-02-07
1  2016-05-11           28 2016-05-11     28 days   2016-06-08
2  2016-02-23           15 2016-02-23     15 days   2016-03-09
3  2015-12-08           30 2015-12-08     30 days   2016-01-07

Please let me know in the comments below if this isn't a good vectorized solution and i'll edit.

如果这不是一个好的矢量化解决方案,请在下面的评论中告诉我,我会编辑。