Python Pandas 的合并返回名称后附加 _x 的列

Question

提问by luffe

I have to dataframes, df1has columns A, B, C, D...and df2has columns A, B, E, F...

我必须使用数据框，df1有A、B、C、D列...而df2有A、B、E、F 列...

The keys I want to merge with are in column A. Bis also (most likely) the same in both dataframes. Though this is a big data set I am working on cleaning so I do not have a extremely good overview of everything yet.

我想合并的键在A列中。B在两个数据帧中也（很可能）相同。虽然这是一个大数据集，但我正在清理，所以我还没有对所有内容有一个非常好的概述。

I do

我愿意

merge(df1, df2, on='A')

And the results contains a column called B_x. Since the data set is big and messy I haven't tried to investigate how B_x differs from Bin df1and Bin df2

结果包含一个名为 B_x 的列。由于数据集是大而凌乱我没有试图研究如何B_X不同于乙在DF1和乙在DF2

So my question is just in general: what does Pandas mean when it has appended the _x to a column name in the merged dataframe?

所以我的问题是一般性的：当 Pandas 将 _x 附加到合并数据框中的列名时，它是什么意思？

Thank you

谢谢

Answer 1

采纳答案by EdChum

The suffixes are added for any clashes in column names that are not involved in the merge operation, see online docs.

为不涉及合并操作的列名中的任何冲突添加后缀，请参阅在线文档。

So in your case if you think that they are same you could just do the merge on both columns:

因此，在您的情况下，如果您认为它们相同，则可以对两列进行合并：

pd.merge(df1, df2, on=['A', 'B'])

What this will do though is return only the values where Aand Bexist in both dataframes as the default merge type is an innermerge.

这将只返回两个数据帧中存在A和B存在的值，因为默认合并类型是inner合并。

So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_ysuffix Bcolumns.

因此，您可以做的是将合并后的 df 大小与第一个进行比较，看看它们是否相同，如果相同，您可以对两列进行合并，或者只是删除/重命名_x/_y后缀B列。

I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outermerge:

我会花时间确定这些值是否确实相同并且存在于两个数据帧中，在这种情况下，您可能希望执行outer合并：

pd.merge(df1, df2, on=['A', 'B'], how='outer')

Then what you could do is then drop duplicate rows (and possibly any NaNrows) and that should give you a clean merged dataframe.

然后你可以做的是删除重复的行（可能还有任何NaN行），这应该会给你一个干净的合并数据框。

merged_df.drop_duplicates(cols=['A', 'B'],inplace=True)

See online docs for drop_duplicates

请参阅在线文档 drop_duplicates

Python Pandas 的合并返回名称后附加 _x 的列

提问by luffe

采纳答案by EdChum

相关推荐

最近更新

标签

Python Pandas 的合并返回名称后附加 _x 的列

提问by luffe

采纳答案by EdChum

相关推荐

在 Python 中将句子转换为 Piglatin

Python 在函数调用期间将参数添加到 kwargs 中？

Python 中的 UDP 客户端/服务器套接字

Python：检查“字典”是否为空似乎不起作用

相关推荐

最近更新

标签