Python 熊猫:合并(加入)多列上的两个数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41815079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:39:26  来源:igfitidea点击:

pandas: merge (join) two data frames on multiple columns

pythonpython-3.xpandasjoin

提问by Edamame

I am trying to join two pandas data frames using two columns:

我正在尝试使用两列连接两个熊猫数据框:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

but got the following error:

但出现以下错误:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'

Any idea what should be the right way to do this? Thanks!

知道什么是正确的方法吗?谢谢!

回答by Shijo

Try this

尝试这个

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_on : label or list, or array-like Field names to join on in right DataFrame or vector/list of vectors per left_on docs

left_on :标签或列表,或类似数组的字段名称,以加入左侧数据帧。可以是 DataFrame 长度的向量或向量列表,以使用特定向量作为连接键而不是列

right_on :标签或列表,或类似数组的字段名称,以加入右侧数据帧或每个 left_on 文档的向量/向量列表

回答by Celius Stingher

the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name mustindividually be within apostrophes. With what has been stated, we can understand why this is inccorect:

这里的问题是,通过使用撇号,您将传递的值设置为字符串,而实际上,正如@Shijo 从文档中所述,该函数需要一个标签或列表,而不是一个字符串!如果列表包含为左右数据框传递的每个列的名称,则每个列名称必须单独位于撇号内。根据上述内容,我们可以理解为什么这是不正确的:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

And this is the correct way of using the function:

这是使用该功能的正确方法:

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

回答by john ed

Another way of doing this: new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')

这样做的另一种方法: new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')