pandas 在一个关键列/错误上加入两个 DataFrame:“列重叠但未指定后缀”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26027877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Join two DataFrames on one key column / ERROR: 'columns overlap but no suffix specified'
提问by Yumi
I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)
我有两个表:销售表和产品表,这两个表共享“零件编号”列。sales 表中的 'PART NUMBER' 列不是唯一的,但在 product 表中是唯一的。(见下图的销售表和产品表的快照)




I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examplesfrom the pandas website my code
我试图在销售表上的每个“零件编号”中添加等效的“说明”,并且我遵循了Pandas网站上的示例我的代码
sales.join(part_table, on='PART NUMBER')
But I got this error:
但我收到了这个错误:
ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')
Can someone explain what this error means and how to solve it?
有人可以解释这个错误的含义以及如何解决它吗?
Many thanks!
非常感谢!
回答by Andy Hayden
I think you want to do a mergerather than a join:
sales.merge(part_table)
Here's an example dataframe:
这是一个示例数据框:
In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])
In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')
In [14]: dfa.merge(dfb)
Out[14]:
A B C
0 1 2 a
1 3 4 b
2 3 4 c
It's unclear from the docs if this is intentational (I thought that onwould be used as the column) but following the exceptions message if you add suffixs we can see what's going on:
从文档中不清楚这是否是故意的(我认为这on将用作列)但是如果您添加后缀,则遵循异常消息我们可以看到发生了什么:
In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
A_a C A_b B
0 1 a 3 4
1 3 b NaN NaN
2 3 c NaN NaN
In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
A_a C A_b B
0 1 a 1 2
1 3 b 3 4
2 3 c NaN NaN
It's ignoring the on kwarg and just doing the join.
它忽略了 on kwarg 而只是进行了连接。

