pandas 根据没有公共列的其他两个日期之间的日期合并两个数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43593554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging two dataframes based on a date between two other dates without a common column
提问by P Spence
I have two dataframes that I need to merge based on whether or not a date value fits in between two other dates. Basically, I need to perform an outer join where B.event_date
is between A.start_date
and A.end_date
. It seems that merge and join always assume a common column which in this case, I do not have.
我有两个数据框,我需要根据日期值是否适合其他两个日期来合并它们。基本上,我需要B.event_date
在A.start_date
和之间执行外部连接A.end_date
。似乎合并和连接总是假设一个公共列,在这种情况下,我没有。
A B
start_date end_date event_date price
0 2017-03-27 2017-04-20 0 2017-01-20 100
1 2017-01-10 2017-02-01 1 2017-01-27 200
Result
start_date end_date event_date price
0 2017-03-27 2017-04-20
1 2017-01-10 2017-02-01 2017-01-20 100
2 2017-01-10 2017-02-01 2017-01-27 200
回答by Scott Boston
Create data and format to datetimes:
创建数据并格式化为日期时间:
df_A = pd.DataFrame({'start_date':['2017-03-27','2017-01-10'],'end_date':['2017-04-20','2017-02-01']})
df_B = pd.DataFrame({'event_date':['2017-01-20','2017-01-27'],'price':[100,200]})
df_A['end_date'] = pd.to_datetime(df_A.end_date)
df_A['start_date'] = pd.to_datetime(df_A.start_date)
df_B['event_date'] = pd.to_datetime(df_B.event_date)
Create keys to do a cross join:
创建键以进行交叉连接:
df_A = df_A.assign(key=1)
df_B = df_B.assign(key=1)
df_merge = pd.merge(df_A, df_B, on='key').drop('key',axis=1)
Filter out records that do not meet criteria of event dates between start and end dates:
过滤掉不符合开始和结束日期之间事件日期条件的记录:
df_merge = df_merge.query('event_date >= start_date and event_date <= end_date')
Join back to original date range table and drop key column
加入回原始日期范围表并删除关键列
df_out = df_A.merge(df_merge, on=['start_date','end_date'], how='left').fillna('').drop('key', axis=1)
print(df_out)
Output:
输出:
end_date start_date event_date price
0 2017-04-20 00:00:00 2017-03-27 00:00:00
1 2017-02-01 00:00:00 2017-01-10 00:00:00 2017-01-20 00:00:00 100
2 2017-02-01 00:00:00 2017-01-10 00:00:00 2017-01-27 00:00:00 200