Python 基于pandas中的多个键合并两个DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32277473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:19:10  来源:igfitidea点击:

Merge two DataFrames based on multiple keys in pandas

pythonpandasmergedataframe

提问by Surah Li

Does pandas (or another module) have any functions to support merge (or join) two tables based on multiple keys?

pandas(或其他模块)是否具有支持基于多个键合并(或连接)两个表的功能?

For example, I have two tables (DataFrames) aand b:

例如,我有两个表(DataFrames)ab

>>> a
A  B  value1
1  1      23
1  2      34
2  1    2342
2  2     333

>>> b
A  B  value2
1  1    0.10
1  2    0.20
2  1    0.13
2  2    0.33

The desired result is:

想要的结果是:

A  B  value1  value2
1  1      23    0.10
1  2      34    0.20
2  1    2342    0.13
2  2     333    0.33

回答by Alex Riley

To merge by multiple keys, you just need to pass the keys in a list to pd.merge:

要通过多个键合并,您只需将列表中的键传递给pd.merge

>>> pd.merge(a, b, on=['A', 'B'])
   A  B  value1  value2
0  1  1      23    0.10
1  1  2      34    0.20
2  2  1    2342    0.13
3  2  2     333    0.33

In fact, the default for pd.mergeis to use the intersection of the two DataFrames' column labels, so pd.merge(a, b)would work equally well in this case.

事实上,默认为pd.merge使用两个 DataFrame 的列标签的交集,因此pd.merge(a, b)在这种情况下同样有效。

回答by Miguel Rueda

According to the most recent pandas documentation the onparameter accepts a label or list of field name, and both must be found in both data frames. Here is a MWE for its use:

根据最新的熊猫文档,on参数接受一个标签或字段名称列表,并且必须在两个数据框中都找到。这是一个 MWE 用于它的用途:

a = pd.DataFrame({'A':['0', '0', '1','1'],'B':['0', '1', '0','1'], 'v':True, False, False, True]})

b = pd.DataFrame({'A':['0', '0', '1','1'], 'B':['0', '1', '0','1'],'v':[False, True, True, True]})

result = pd.merge(a, b, on=['A','B'], how='inner', suffixes=['_and', '_or'])
>>> result
    A   B   v_and   v_or

0   0   0   True    False
1   0   1   False   True
2   1   0   False   True
3   1   1   True    True

on : label or list Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

on :标签或列表要加入的列或索引级别名称。这些必须在两个 DataFrame 中都能找到。如果 on 为 None 并且不合并索引,则默认为两个 DataFrame 中列的交集。

Check out latest pd.mergedocumentation for further details.

查看最新的pd.merge文档以获取更多详细信息。