在python中的公共列上加入两个数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41463119/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JOIN two dataframes on common column in python
提问by Shubham R
I have a dataframe df:
我有一个数据框 df:
id name count
1 a 10
2 b 20
3 c 30
4 d 40
5 e 50
Here I have another dataframe df2:
这里我有另一个数据框 df2:
id1 price rating
1 100 1.0
2 200 2.0
3 300 3.0
5 500 5.0
I want to join these two dataframes on column id and id1(both refer same). Here is an example of df3:
我想在列 id 和 id1 上加入这两个数据框(两者都引用相同)。这是 df3 的示例:
id name count price rating
1 a 10 100 1.0
2 b 20 200 2.0
3 c 30 300 3.0
4 d 40 Nan Nan
5 e 50 500 5.0
Should I use df.merge or pd.concat?
我应该使用 df.merge 还是 pd.concat?
回答by jezrael
Use merge
:
使用merge
:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
Another solution is simple rename column:
另一种解决方案是简单的重命名列:
print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id', how='left'))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
If need only column price
the simpliest is map
:
如果只需要列,price
最简单的是map
:
df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Another 2 solutions:
另外2个解决方案:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
.drop(['id1', 'rating'], axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
.drop('id1', axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
回答by piRSquared
join
utilizes the index to merge on unless we specify a column to use instead. However, we can only specify a column instead of the index for the 'left'
dataframe.
join
除非我们指定要使用的列,否则利用索引进行合并。但是,我们只能为'left'
数据框指定一列而不是索引。
Strategy:
战略:
set_index
ondf2
to beid1
- use
join
withdf
as the left dataframe andid
as theon
parameter. Note that I could haveset_index('id')
ondf
to avoid having to use theon
parameter. However, this allowed me leave the column in the dataframe rather than having to reset_index later.
set_index
上df2
是id1
- 使用
join
与df
作为左数据框,并id
作为on
参数。请注意,我能有set_index('id')
上df
,以避免使用该on
参数。但是,这使我可以将列留在数据框中,而不必稍后重新设置索引。
df.join(df2.set_index('id1'), on='id')
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
If you only want price
from df2
如果你只想price
从df2
df.join(df2.set_index('id1')[['price']], on='id')
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0