pandas 根据列值加入熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44781633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:53:27  来源:igfitidea点击:

Join pandas dataframes based on column values

pythonmysqlsqlpandasdataframe

提问by fremorie

I'm quite new to pandas dataframes, and I'm experiencing some troubles joining two tables.

我对Pandas数据框很陌生,并且在连接两个表时遇到了一些麻烦。

The first df has just 3 columns:

第一个 df 只有 3 列:

DF1:
item_id    position    document_id
336        1           10
337        2           10
338        3           10
1001       1           11
1002       2           11
1003       3           11
38         10          146

And the second has exactly same two columns (and plenty of others):

第二个具有完全相同的两列(以及许多其他列):

DF2
item_id    document_id    col1    col2   col3    ...
337        10             ...     ...    ...
1002       11             ...     ...    ...
1003       11             ...     ...    ...

What I need is to perform an operation which, in SQL, would look as follows:

我需要的是执行一个操作,在 SQL 中,该操作如下所示:

DF1 join DF2 on 
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id

And, as a result, I want to see DF2, complemented with column 'position':

因此,我希望看到 DF2,并辅以“位置”列:

item_id    document_id    position    col1   col2   col3   ...

What is a good way to do this using pandas?

使用Pandas来做到这一点的好方法是什么?

Thank you!

谢谢!

回答by jezrael

I think you need mergewith default innerjoin, but is necessary no duplicated combinations of values in both columns:

我认为您需要merge使用默认inner连接,但必须在两列中没有重复的值组合:

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

But if necessary positioncolumn in position 3:

但如有必要position列在位置3

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0