pandas 在最近的时间戳上合并两个熊猫数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33491840/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
merging two pandas dataframes on nearest time stamp
提问by Srirekha Srirekha
I have two daframes df1 and df2
我有两个 daframe df1 和 df2
df1 is
df1 是
time status
2/2/2015 8.00 am on time
2/2/2015 9.00 am canceled
2/2/2015 10.30 am on time
2/2/2015 12.45 pm on time
df2 is
df2 是
w_time temp
2/2/2015 8.00 am 45
2/2/2015 8.50 am 46
2/2/2015 9.40 am 47
2/2/2015 10.15 am 47
2/2/2015 10.35 am 48
2/2/2015 12.00 pm 48
2/2/2015 1.00 pm 49
Now i want merge two data frames in such way that the second time stamp is always closer or equal to the first timestamp
现在我想以这样的方式合并两个数据帧,即第二个时间戳总是接近或等于第一个时间戳
the result should be
结果应该是
time status w_time temp
2/2/2015 8.00 am on time 2/2/2015 8.00 am 45
2/2/2015 9.00 am canceled 2/2/2015 8.50 am 46
2/2/2015 10.30 am on time 2/2/2015 10.35 am 48
2/2/2015 12.45 pm on time 2/2/2015 1.00 pm 49
回答by Andy Hayden
First ensure that the date columns are datetime64 columns.
首先确保日期列是 datetime64 列。
df1['time'] = pd.to_datetime(df1['time'].str.replace(".", ":"))
df2['w_time'] = pd.to_datetime(df2['w_time'].str.replace(".", ":"))
If you set these as DatetimeIndex
s can then use reindex
with the 'nearest' method:
如果您将这些设置为DatetimeIndex
s 则可以reindex
与“最近”方法一起使用:
In [11]: df1 = df1.set_index("time")
In [12]: df2 = df2.set_index("w_time", drop=False)
In [13]: df1
Out[13]:
status
time
2015-02-02 08:00:00 on time
2015-02-02 09:00:00 canceled
2015-02-02 10:30:00 on time
2015-02-02 12:45:00 on time
In [14]: df2
Out[14]:
temp w_time
w_time
2015-02-02 08:00:00 45 2015-02-02 08:00:00
2015-02-02 08:50:00 46 2015-02-02 08:50:00
2015-02-02 09:40:00 47 2015-02-02 09:40:00
2015-02-02 10:15:00 47 2015-02-02 10:15:00
2015-02-02 10:35:00 48 2015-02-02 10:35:00
2015-02-02 12:00:00 48 2015-02-02 12:00:00
2015-02-02 13:00:00 49 2015-02-02 13:00:00
With the following:
具有以下内容:
In [15]: df2.reindex(df1.index, method='nearest')
Out[15]:
temp w_time
time
2015-02-02 08:00:00 45 2015-02-02 08:00:00
2015-02-02 09:00:00 46 2015-02-02 08:50:00
2015-02-02 10:30:00 48 2015-02-02 10:35:00
2015-02-02 12:45:00 49 2015-02-02 13:00:00
Then add these columns/join back to df1.
然后将这些列/连接添加回 df1。