Pandas 合并创建不需要的重复条目

Question

提问by Nic Scozzaro

I'm new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to many identical values. There are some duplicates in each column that I'd like to keep. My desired output is shown below. Adding how='inner' or 'outer' does not yield the desired result.

我是 Pandas 的新手，我想合并两个具有相似列的数据集。除了许多相同的值之外，与另一列相比，每一列都有一些独特的值。我想保留每列中的一些重复项。我想要的输出如下所示。添加 how='inner' 或 'outer' 不会产生所需的结果。

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

print(pd.merge(df1,df2))

output:
   A
0  2
1  2
2  2
3  2
4  3
5  4
6  5

desired/expected output:
   A
0  2
1  2
2  3
3  4
4  5

Please let me know how/if I can achieve the desired output using merge, thank you!

请让我知道如何/是否可以使用合并实现所需的输出，谢谢！

EDITTo clarify why I'm confused about this behavior, if I simply add another column, it doesn't make four 2's but rather there are only two 2's, so I would expect that in my first example it would also have the two 2's. Why does the behavior seem to change, what's pandas doing?

编辑为了澄清为什么我对这种行为感到困惑，如果我简单地添加另一列，它不会产生四个 2，而是只有两个 2，所以我希望在我的第一个例子中它也会有两个 2 . 为什么行为似乎发生了变化，Pandas在做什么？

import pandas as pd
dict1 = {'A':[2,2,3,4,5],
         'B':['red','orange','yellow','green','blue'],
        }
dict2 = {'A':[2,2,3,4,5],
         'B':['red','orange','yellow','green','blue'],
        }

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

print(pd.merge(df1,df2))

output:
   A       B
0  2     red
1  2  orange
2  3  yellow
3  4   green
4  5    blue

However, based on the first example I would expect:
   A       B
0  2     red
1  2  orange
2  2     red
3  2  orange
4  3  yellow
5  4   green
6  5    blue

Answer 1

采纳答案by Rene

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1).reset_index()
df2 = pd.DataFrame(dict2).reset_index()

df = df1.merge(df2, on = 'A')
df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)

print(df)

Output:

输出：

Answer 2

回答by AsheKetchum

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]

df1.merge(df2).drop('index', 1, inplace = True)

The idea is to merge based on the matching indices as well as matching 'A' column values.
Previously, since the way merge works depends on matches, what happened is that the first 2 in df1 was matched to both the first and second 2 in df2, and the second 2 in df1 was matched to both the first and second 2 in df2 as well.

这个想法是基于匹配的索引以及匹配的“A”列值进行合并。
以前，由于合并的工作方式取决于匹配，所发生的情况是 df1 中的第一个 2 与 df2 中的第一个和第二个 2 匹配，而 df1 中的第二个 2 与 df2 中的第一个和第二个 2 都匹配为好。

If you try this, you will see what I am talking about.

如果你试试这个，你就会明白我在说什么。

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df1['index'] = [i for i in range(len(df1))]
df2 = pd.DataFrame(dict2)
df2['index'] = [i for i in range(len(df2))]

df1.merge(df2, on = 'A')

Answer 3

回答by Qehu

did you try df.drop_duplicates() ?

你试过 df.drop_duplicates() 吗？

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

df=pd.merge(df1,df2)
df_new=df.drop_duplicates() 
print df
print df_new

Seems that it gives the results that you want

似乎它给出了你想要的结果

Pandas 合并创建不需要的重复条目

提问by Nic Scozzaro

采纳答案by Rene

回答by AsheKetchum

回答by Qehu

相关推荐

最近更新

标签

Pandas 合并创建不需要的重复条目

提问by Nic Scozzaro

采纳答案by Rene

回答by AsheKetchum

回答by Qehu

相关推荐

如何找到所有（）正则表达式序列到 Pandas 数据帧？

pandas 如何将特定列转换为熊猫中的行关联其他列值

pandas 如何删除非法字符以便数据框可以写入 Excel

Python Pandas - 根据值删除行

相关推荐

最近更新

标签