在python中的公共列上加入两个数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41463119/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:00:24  来源:igfitidea点击:

JOIN two dataframes on common column in python

pythonpandasjoin

提问by Shubham R

I have a dataframe df:

我有一个数据框 df:

id   name   count
1    a       10
2    b       20
3    c       30
4    d       40
5    e       50

Here I have another dataframe df2:

这里我有另一个数据框 df2:

id1  price   rating
 1     100     1.0
 2     200     2.0
 3     300     3.0
 5     500     5.0

I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:

我想在列 id 和 id1 上加入这两个数据框(两者都引用相同)。这是 df3 的示例:

id   name   count   price   rating
1    a       10      100      1.0
2    b       20      200      2.0
3    c       30      300      3.0
4    d       40      Nan      Nan
5    e       50      500      5.0

Should I use df.merge or pd.concat?

我应该使用 df.merge 还是 pd.concat?

回答by jezrael

Use merge:

使用merge

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

Another solution is simple rename column:

另一种解决方案是简单的重命名列:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If need only column pricethe simpliest is map:

如果只需要列,price最简单的是map

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Another 2 solutions:

另外2个解决方案:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
         .drop(['id1', 'rating'], axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0


print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
         .drop('id1', axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

回答by piRSquared

joinutilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left'dataframe.

join除非我们指定要使用的列,否则利用索引进行合并。但是,我们只能为'left'数据框指定一列而不是索引。

Strategy:

战略:

  • set_indexon df2to be id1
  • use joinwith dfas the left dataframe and idas the onparameter. Note that I could have set_index('id')on dfto avoid having to use the onparameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.
  • set_indexdf2id1
  • 使用joindf作为左数据框,并id作为on参数。请注意,我能有set_index('id')df,以避免使用该on参数。但是,这使我可以将列留在数据框中,而不必稍后重新设置索引。


df.join(df2.set_index('id1'), on='id')

   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0


If you only want pricefrom df2

如果你只想pricedf2

df.join(df2.set_index('id1')[['price']], on='id')


   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0