类似 SQL 的 Pandas 连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14298401/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL like joins in pandas
提问by landewednack
I have two dataframes, the first is of the form (note that the dates are datetime objects):
我有两个数据框,第一个是形式(注意日期是日期时间对象):
df = DataFrame('key': [0,1,2,3,4,5],
'date': [date0,date1, date2, date3, date4, date5],
'value': [0,10,20,30,40,50])
And a second which is of the form:
第二个是以下形式:
df2 = DataFrame('key': [0,1,2,3,4,5],
'valid_from': [date0, date0, date0, date3, date3, date3],
'valid_to': [date2, date2, date2, date5, date5, date5],
'value': [0, 100, 200, 300, 400, 500])
And I'm trying to efficiently join where the keys match and the date is between the valid_from and valid_to. What I've come up with is the following:
而且我正在尝试有效地加入键匹配且日期介于 valid_from 和 valid_to 之间的位置。我想出的是以下内容:
def map_keys(df2, key, date):
value = df2[df2['key'] == key &
df2['valid_from'] <= date &
df2['valid_to'] >= date]['value'].values[0]
return value
keys = df['key'].values
dates = df['date'].values
keys_dates = zip(keys, dates)
values = []
for key_date in keys_dates:
value = map_keys(df2, key_date[0], key_date[1])
values.append(value)
df['joined_value'] = values
While this seems to do the job it doesn't feel like a particularly elegant solution. I was wondering if anybody had a better idea for a join such as this.
虽然这似乎可以完成工作,但它并不是一个特别优雅的解决方案。我想知道是否有人对这样的加入有更好的主意。
Thanks for you help - it is much appreciated.
感谢您的帮助 - 非常感谢。
回答by Garrett
Currently, you can do this in a few steps with the built-in pandas.merge()and boolean indexing.
目前,您可以使用内置pandas.merge()和布尔索引通过几个步骤来完成此操作。
merged = df.merge(df2, on='key')
valid = (merged.date >= merged.valid_from) & \
(merged.date <= merged.valid_to)
df['joined_value'] = merged[valid].value_y
(Note: the valuecolumn of df2is accessed as value_yafter the merge because it conflicts with a column of the same name in dfand the default merge-conflict suffixes are _x, _yfor the left and right frames, respectively.)
(注意:在合并之后访问的value列,因为它与同名的列冲突,并且默认的合并冲突后缀分别用于左框架和右框架。)df2value_ydf_x, _y
Here's an example, with a different setup to show how invalid dates are handled.
这是一个示例,使用不同的设置来显示如何处理无效日期。
n = 8
dates = pd.date_range('1/1/2013', freq='D', periods=n)
df = DataFrame({'key': np.arange(n),
'date': dates,
'value': np.arange(n) * 10})
df2 = DataFrame({'key': np.arange(n),
'valid_from': dates[[1,1,1,1,5,5,5,5]],
'valid_to': dates[[4,4,4,4,6,6,6,6]],
'value': np.arange(n) * 100})
Input df2:
输入df2:
key valid_from valid_to value
0 0 2013-01-02 00:00:00 2013-01-05 00:00:00 0
1 1 2013-01-02 00:00:00 2013-01-05 00:00:00 100
2 2 2013-01-02 00:00:00 2013-01-05 00:00:00 200
3 3 2013-01-02 00:00:00 2013-01-05 00:00:00 300
4 4 2013-01-06 00:00:00 2013-01-07 00:00:00 400
5 5 2013-01-06 00:00:00 2013-01-07 00:00:00 500
6 6 2013-01-06 00:00:00 2013-01-07 00:00:00 600
7 7 2013-01-06 00:00:00 2013-01-07 00:00:00 700
Intermediate frame merged:
中间帧merged:
date key value_x valid_from valid_to value_y
0 2013-01-01 00:00:00 0 0 2013-01-02 00:00:00 2013-01-05 00:00:00 0
1 2013-01-02 00:00:00 1 10 2013-01-02 00:00:00 2013-01-05 00:00:00 100
2 2013-01-03 00:00:00 2 20 2013-01-02 00:00:00 2013-01-05 00:00:00 200
3 2013-01-04 00:00:00 3 30 2013-01-02 00:00:00 2013-01-05 00:00:00 300
4 2013-01-05 00:00:00 4 40 2013-01-06 00:00:00 2013-01-07 00:00:00 400
5 2013-01-06 00:00:00 5 50 2013-01-06 00:00:00 2013-01-07 00:00:00 500
6 2013-01-07 00:00:00 6 60 2013-01-06 00:00:00 2013-01-07 00:00:00 600
7 2013-01-08 00:00:00 7 70 2013-01-06 00:00:00 2013-01-07 00:00:00 700
Final value of dfafter adding column joined_value:
df添加列后的最终值joined_value:
date key value joined_value
0 2013-01-01 00:00:00 0 0 NaN
1 2013-01-02 00:00:00 1 10 100
2 2013-01-03 00:00:00 2 20 200
3 2013-01-04 00:00:00 3 30 300
4 2013-01-05 00:00:00 4 40 NaN
5 2013-01-06 00:00:00 5 50 500
6 2013-01-07 00:00:00 6 60 600
7 2013-01-08 00:00:00 7 70 NaN

