Python Pandas join 问题:列重叠但未指定后缀

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26645515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:48:25  来源:igfitidea点击:

Pandas join issue: columns overlap but no suffix specified

pythonjoinpandas

提问by user308827

I have following 2 data frames:

我有以下 2 个数据框:

df_a =

     mukey  DI  PI
0   100000  35  14
1  1000005  44  14
2  1000006  44  14
3  1000007  43  13
4  1000008  43  13

df_b = 
    mukey  niccdcd
0  190236        4
1  190237        6
2  190238        7
3  190239        4
4  190240        7

When I try to join these 2 dataframes:

当我尝试加入这两个数据框时:

join_df = df_a.join(df_b,on='mukey',how='left')

I get the error:

我收到错误:

*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')

Why is this so? The dataframes do have common 'mukey' values.

为什么会这样?数据框确实具有共同的“mukey”值。

采纳答案by EdChum

Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:

您发布的数据片段上的错误有点神秘,因为没有通用值,连接操作失败,因为值不重叠它需要您为左侧和右侧提供后缀:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

mergeworks because it doesn't have this restriction:

merge有效,因为它没有这个限制:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN

回答by Velizar VESSELINOV

The .join()function is using the indexof the passed as argument dataset, so you should use set_indexor use .mergefunction instead.

.join()函数正在使用index作为参数传递的数据集的 ,因此您应该改用set_index或使用.merge函数。

Please find the two examples that should work in your case:

请找到适合您的情况的两个示例:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

or

或者

join_df = df_a.merge(df_b, on='mukey', how='left')

回答by user1761806

This error indicates that the two tables have the 1 or more column names that have the same column name. The error message translates to: "I can see the same column in both tables but you haven't told me to rename either before bringing one of them in"

此错误表示这两个表具有 1 个或多个具有相同列名的列名。错误消息转换为:“我可以在两个表中看到相同的列,但在引入其中一个之前,您没有告诉我重命名”

You either want to delete one of the columns before bringing it in from the other on using del df['column name'], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought it.

您要么希望在使用 del df['column name'] 将其从另一列引入之前删除其中一列,要么使用 lsuffix 重新编写原始列,或使用 rsuffix 重命名被引入的列。

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')

回答by user12690524

Mainly join is used exclusively to join based on the index,not on the attribute names,so change the attributes names in two different dataframes,then try to join,they will be joined,else this error is raised

join主要是专门用来基于索引的,而不是基于属性名称的,所以在两个不同的数据帧中更改属性名称,然后尝试加入,它们将被加入,否则会引发此错误