pandas 在一个关键列/错误上加入两个 DataFrame：“列重叠但未指定后缀”

Question

提问by Yumi

I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)

我有两个表：销售表和产品表，这两个表共享“零件编号”列。sales 表中的 'PART NUMBER' 列不是唯一的，但在 product 表中是唯一的。（见下图的销售表和产品表的快照）

enter image description here

在此处输入图片说明

enter image description here

在此处输入图片说明

I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examplesfrom the pandas website my code

我试图在销售表上的每个“零件编号”中添加等效的“说明”，并且我遵循了Pandas网站上的示例我的代码

sales.join(part_table, on='PART NUMBER')

But I got this error:

但我收到了这个错误：

ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')

Can someone explain what this error means and how to solve it?

有人可以解释这个错误的含义以及如何解决它吗？

Many thanks!

非常感谢！

Answer 1

回答by Andy Hayden

I think you want to do a mergerather than a join:

我认为您想要进行合并而不是加入：

sales.merge(part_table)

Here's an example dataframe:

这是一个示例数据框：

In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])

In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')

In [14]: dfa.merge(dfb)
Out[14]:
   A  B  C
0  1  2  a
1  3  4  b
2  3  4  c

It's unclear from the docs if this is intentational (I thought that onwould be used as the column) but following the exceptions message if you add suffixs we can see what's going on:

从文档中不清楚这是否是故意的（我认为这on将用作列）但是如果您添加后缀，则遵循异常消息我们可以看到发生了什么：

In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
   A_a  C  A_b   B
0    1  a    3   4
1    3  b  NaN NaN
2    3  c  NaN NaN

In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
   A_a  C  A_b   B
0    1  a    1   2
1    3  b    3   4
2    3  c  NaN NaN

It's ignoring the on kwarg and just doing the join.

它忽略了 on kwarg 而只是进行了连接。

pandas 在一个关键列/错误上加入两个 DataFrame：“列重叠但未指定后缀”

提问by Yumi

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 在一个关键列/错误上加入两个 DataFrame：“列重叠但未指定后缀”

提问by Yumi

回答by Andy Hayden

相关推荐

Python pandas - 特定的合并/替换

pandas 带有熊猫数据框的矢量化半正弦公式

如何在 IPython 笔记本的 Pandas DataFrame 列中左对齐文本

绘制表格并显示 Pandas Dataframe

相关推荐

最近更新

标签