Pandas：将 timedelta 列添加到 datetime 列（矢量化）

Question

提问by flyingmeatball

I have a pandas dataframe with two columns, a date column and an int column, and I'd simply like to add the int column (in days) to the date column. I found a solution using df.apply(), but that was too slow on my full dataset. I don't see a ton of documentation on doing this in a vectorized manner (the closest I could find was this), so I wanted to make sure the solution I found was the best way to go forward.

我有一个包含两列的 Pandas 数据框，一个日期列和一个 int 列，我只想将 int 列（以天为单位）添加到日期列中。我找到了一个使用 df.apply() 的解决方案，但这在我的完整数据集上太慢了。我没有看到大量关于以矢量化方式执行此操作的文档（我能找到的最接近的是this），因此我想确保我找到的解决方案是前进的最佳方式。

My raw data is just a column of strings as a column of ints (days).

我的原始数据只是一列字符串作为一列整数（天）。

import pandas as pd
from datetime import timedelta

df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                  columns = ['ship_string','days_supply'])
print df  

ship_string  days_supply
0  2016-01-10           28
1  2016-05-11           28
2  2016-02-23           15
3  2015-12-08           30

My first thought (which worked) was to use .apply as follows:

我的第一个想法（有效）是使用 .apply 如下：

def f(x):    
    return x['ship_date'] + timedelta(days=x['days_supply'] )

df['ship_date'] = pd.to_datetime(df['ship_string'])

df['supply_ended'] = df.apply(f,axis = 1)

That worked, but is exceedingly slow. I've posted my alternate solution below as an answer to the question, but I'd like to get confirmation that it is "best practice". I couldn't find many good threads on adding timedelta columns to dates in pandas (especially in a vectorized manner), so thought I'd add one that is a little bit more user friendly and hopefully it will help the next poor soul trying to do this.

那行得通，但速度非常慢。我已经在下面发布了我的替代解决方案作为问题的答案，但我想确认这是“最佳实践”。我找不到很多关于将 timedelta 列添加到 Pandas 中的日期的好线程（尤其是以矢量化方式），所以我想我会添加一个对用户更友好的，希望它会帮助下一个尝试做这个。

Answer 1

回答by flyingmeatball

Full code solution:

完整代码解决方案：

import pandas as pd
from datetime import timedelta

df = pd.DataFrame([['2016-01-10',28],['2016-05-11',28],['2016-02-23',15],['2015-12-08',30]], 
                      columns = ['ship_string','days_supply'])

df['ship_date'] = pd.to_datetime(df['ship_string'])

df['time_added'] = pd.to_timedelta(df['days_supply'],'d')
df['supply_ended'] = df['ship_date'] + df['time_added']

print df

  ship_string  days_supply  ship_date  time_added supply_ended
0  2016-01-10           28 2016-01-10     28 days   2016-02-07
1  2016-05-11           28 2016-05-11     28 days   2016-06-08
2  2016-02-23           15 2016-02-23     15 days   2016-03-09
3  2015-12-08           30 2015-12-08     30 days   2016-01-07

Please let me know in the comments below if this isn't a good vectorized solution and i'll edit.

如果这不是一个好的矢量化解决方案，请在下面的评论中告诉我，我会编辑。

Pandas：将 timedelta 列添加到 datetime 列（矢量化）

提问by flyingmeatball

回答by flyingmeatball

相关推荐

最近更新

标签

Pandas：将 timedelta 列添加到 datetime 列（矢量化）

提问by flyingmeatball

回答by flyingmeatball

相关推荐

pandas .sas7dbat 中的编码

将 Pandas 数据帧变成内存中的类文件对象？

pandas 确认两个熊猫数据框的相等性？

pandas 如何在python中流式传输和操作大型数据文件

相关推荐

最近更新

标签