Python Pandas join 问题:列重叠但未指定后缀
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26645515/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas join issue: columns overlap but no suffix specified
提问by user308827
I have following 2 data frames:
我有以下 2 个数据框:
df_a =
mukey DI PI
0 100000 35 14
1 1000005 44 14
2 1000006 44 14
3 1000007 43 13
4 1000008 43 13
df_b =
mukey niccdcd
0 190236 4
1 190237 6
2 190238 7
3 190239 4
4 190240 7
When I try to join these 2 dataframes:
当我尝试加入这两个数据框时:
join_df = df_a.join(df_b,on='mukey',how='left')
I get the error:
我收到错误:
*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')
Why is this so? The dataframes do have common 'mukey' values.
为什么会这样?数据框确实具有共同的“mukey”值。
采纳答案by EdChum
Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:
您发布的数据片段上的错误有点神秘,因为没有通用值,连接操作失败,因为值不重叠它需要您为左侧和右侧提供后缀:
In [173]:
df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
mukey_left DI PI mukey_right niccdcd
index
0 100000 35 14 NaN NaN
1 1000005 44 14 NaN NaN
2 1000006 44 14 NaN NaN
3 1000007 43 13 NaN NaN
4 1000008 43 13 NaN NaN
mergeworks because it doesn't have this restriction:
merge有效,因为它没有这个限制:
In [176]:
df_a.merge(df_b, on='mukey', how='left')
Out[176]:
mukey DI PI niccdcd
0 100000 35 14 NaN
1 1000005 44 14 NaN
2 1000006 44 14 NaN
3 1000007 43 13 NaN
4 1000008 43 13 NaN
回答by Velizar VESSELINOV
The .join()function is using the indexof the passed as argument dataset, so you should use set_indexor use .mergefunction instead.
该.join()函数正在使用index作为参数传递的数据集的 ,因此您应该改用set_index或使用.merge函数。
Please find the two examples that should work in your case:
请找到适合您的情况的两个示例:
join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')
or
或者
join_df = df_a.merge(df_b, on='mukey', how='left')
回答by user1761806
This error indicates that the two tables have the 1 or more column names that have the same column name. The error message translates to: "I can see the same column in both tables but you haven't told me to rename either before bringing one of them in"
此错误表示这两个表具有 1 个或多个具有相同列名的列名。错误消息转换为:“我可以在两个表中看到相同的列,但在引入其中一个之前,您没有告诉我重命名”
You either want to delete one of the columns before bringing it in from the other on using del df['column name'], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought it.
您要么希望在使用 del df['column name'] 将其从另一列引入之前删除其中一列,要么使用 lsuffix 重新编写原始列,或使用 rsuffix 重命名被引入的列。
df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
回答by user12690524
Mainly join is used exclusively to join based on the index,not on the attribute names,so change the attributes names in two different dataframes,then try to join,they will be joined,else this error is raised
join主要是专门用来基于索引的,而不是基于属性名称的,所以在两个不同的数据帧中更改属性名称,然后尝试加入,它们将被加入,否则会引发此错误

