pandas 根据列值加入熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44781633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Join pandas dataframes based on column values
提问by fremorie
I'm quite new to pandas dataframes, and I'm experiencing some troubles joining two tables.
我对Pandas数据框很陌生,并且在连接两个表时遇到了一些麻烦。
The first df has just 3 columns:
第一个 df 只有 3 列:
DF1:
item_id position document_id
336 1 10
337 2 10
338 3 10
1001 1 11
1002 2 11
1003 3 11
38 10 146
And the second has exactly same two columns (and plenty of others):
第二个具有完全相同的两列(以及许多其他列):
DF2
item_id document_id col1 col2 col3 ...
337 10 ... ... ...
1002 11 ... ... ...
1003 11 ... ... ...
What I need is to perform an operation which, in SQL, would look as follows:
我需要的是执行一个操作,在 SQL 中,该操作如下所示:
DF1 join DF2 on
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id
And, as a result, I want to see DF2, complemented with column 'position':
因此,我希望看到 DF2,并辅以“位置”列:
item_id document_id position col1 col2 col3 ...
What is a good way to do this using pandas?
使用Pandas来做到这一点的好方法是什么?
Thank you!
谢谢!
回答by jezrael
I think you need merge
with default inner
join, but is necessary no duplicated combinations of values in both columns:
我认为您需要merge
使用默认inner
连接,但必须在两列中没有重复的值组合:
print (df2)
item_id document_id col1 col2 col3
0 337 10 s 4 7
1 1002 11 d 5 8
2 1003 11 f 7 0
df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
item_id position document_id col1 col2 col3
0 337 2 10 s 4 7
1 1002 2 11 d 5 8
2 1003 3 11 f 7 0
But if necessary position
column in position 3
:
但如有必要position
列在位置3
:
df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
item_id document_id position col1 col2 col3
0 337 10 2 s 4 7
1 1002 11 2 d 5 8
2 1003 11 3 f 7 0