合并具有来自两个不同列的匹配值的数据帧 - Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45175060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:02:04  来源:igfitidea点击:

Merge DataFrames with Matching Values From Two Different Columns - Pandas

pythonpandasdataframemerge

提问by i.n.n.m

I have two different DataFrames that I want to merge with dateand hourscolumns. I saw some threadsthat are there, but I could not find the solution for my issue. I also read thisdocument and tried different combinations, however, did not work well.

我有两个不同的 DataFrame 想要合并datehours列。我看到了一些线程,但我找不到解决我的问题的方法。我也阅读了这个文档并尝试了不同的组合,但是效果不佳。

Example of my two different DataFrames,

我的两个不同数据帧的示例,

DF1

DF1

        date    hours        var1            var2 
0   2013-07-10  00:00:00    150.322617  52.225920   
1   2013-07-10  01:00:00    155.250917  53.365296   
2   2013-07-10  02:00:00    124.918667  51.158249   
3   2013-07-10  03:00:00    143.839217  53.138251
 .....  
9   2013-09-10  09:00:00    148.135818  86.676341
10  2013-09-10  10:00:00    147.833517  53.658016   
11  2013-09-10  12:00:00    149.580233  69.745368   
12  2013-09-10  13:00:00    163.715317  14.524894   
13  2013-09-10  14:00:00    168.856650  10.762779   

DF2

DF2

       date      hours      myvar1        myvar2 
0   2013-07-10  09:00:00    1.617         98.56 
1   2013-07-10  10:00:00    2.917         23.60 
2   2013-07-10  12:00:00    19.667        36.15 
3   2013-07-10  13:00:00    14.217        45.16
 .....  
20 2013-09-10   20:00:00    1.517         53.56 
21 2013-09-10   21:00:00    5.233         69.47
22 2013-09-10   22:00:00    13.717        14.25
23 2013-09-10   23:00:00    18.850        10.69 

As you can see in both DataFrames, DF2starts with 09:00:00and I want to join with DF109:00:00, which is basically the matchind dates and times. So far, I tried many different combination using previous threads and the documentation mentioned above. An example,

正如您在两个 DataFrame 中所见,DF2以 开头,09:00:00我想加入DF109:00:00,这基本上是匹配日期和时间。到目前为止,我使用以前的线程和上面提到的文档尝试了许多不同的组合。一个例子,

merged_df = DF2.merge(DF1, how = 'left', on = ['date', 'hours'])

This was introduces NANvalues for right rightDataFrame. I know, I do not have to use both dateand hourscolumns, however, still getting the same result. I tried Rquick like this, which works perfectly fine.

这是NAN为正确的rightDataFrame引入值。我知道,我不必同时使用datehours列,但是,仍然得到相同的结果。我R像这样快速尝试,效果很好。

merged_df  <- left_join(DF1, DF2, by = 'date')

Is there anyway in pandasto merge DatFrames just with matching values without getting NaNvalues?

是否有pandas将 DatFrames 仅与匹配值合并而不获取NaN值的方法?

回答by Scott Boston

Use how='inner'in pd.merge:

使用how='inner'pd.merge

merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])

This will perform and "inner-join" thereby omitting rows in each dataframe that do not match. Hence, no NaN in either the right or left part of merged dataframe.

这将执行和“内部连接”,从而省略每个数据帧中不匹配的行。因此,合并数据帧的右侧或左侧部分都没有 NaN。