Python Pandas 的合并返回名称后附加 _x 的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23197537/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:28:15  来源:igfitidea点击:

Pandas' merge returns a column with _x appended to the name

pythonpandas

提问by luffe

I have to dataframes, df1has columns A, B, C, D...and df2has columns A, B, E, F...

我必须使用数据框,df1A、B、C、D...df2A、B、E、F 列...

The keys I want to merge with are in column A. Bis also (most likely) the same in both dataframes. Though this is a big data set I am working on cleaning so I do not have a extremely good overview of everything yet.

我想合并的键在A列中。B在两个数据帧中也(很可能)相同。虽然这是一个大数据集,但我正在清理,所以我还没有对所有内容有一个非常好的概述。

I do

我愿意

merge(df1, df2, on='A')

And the results contains a column called B_x. Since the data set is big and messy I haven't tried to investigate how B_x differs from Bin df1and Bin df2

结果包含一个名为 B_x 的列。由于数据集是大而凌乱我没有试图研究如何B_X不同于DF1DF2

So my question is just in general: what does Pandas mean when it has appended the _x to a column name in the merged dataframe?

所以我的问题是一般性的:当 Pandas 将 _x 附加到合并数据框中的列名时,它是什么意思?

Thank you

谢谢

采纳答案by EdChum

The suffixes are added for any clashes in column names that are not involved in the merge operation, see online docs.

为不涉及合并操作的列名中的任何冲突添加后缀,请参阅在线文档

So in your case if you think that they are same you could just do the merge on both columns:

因此,在您的情况下,如果您认为它们相同,则可以对两列进行合并:

pd.merge(df1, df2, on=['A', 'B'])

What this will do though is return only the values where Aand Bexist in both dataframes as the default merge type is an innermerge.

这将只返回两个数据帧中存在AB存在的值,因为默认合并类型是inner合并。

So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_ysuffix Bcolumns.

因此,您可以做的是将合并后的 df 大小与第一个进行比较,看看它们是否相同,如果相同,您可以对两列进行合并,或者只是删除/重命名_x/_y后缀B列。

I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outermerge:

我会花时间确定这些值是否确实相同并且存在于两个数据帧中,在这种情况下,您可能希望执行outer合并:

pd.merge(df1, df2, on=['A', 'B'], how='outer')

Then what you could do is then drop duplicate rows (and possibly any NaNrows) and that should give you a clean merged dataframe.

然后你可以做的是删除重复的行(可能还有任何NaN行),这应该会给你一个干净的合并数据框。

merged_df.drop_duplicates(cols=['A', 'B'],inplace=True)

See online docs for drop_duplicates

请参阅在线文档 drop_duplicates