Python Pandas 的合并返回名称后附加 _x 的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23197537/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas' merge returns a column with _x appended to the name
提问by luffe
I have to dataframes, df1has columns A, B, C, D...and df2has columns A, B, E, F...
我必须使用数据框,df1有A、B、C、D列...而df2有A、B、E、F 列...
The keys I want to merge with are in column A. Bis also (most likely) the same in both dataframes. Though this is a big data set I am working on cleaning so I do not have a extremely good overview of everything yet.
我想合并的键在A列中。B在两个数据帧中也(很可能)相同。虽然这是一个大数据集,但我正在清理,所以我还没有对所有内容有一个非常好的概述。
I do
我愿意
merge(df1, df2, on='A')
And the results contains a column called B_x. Since the data set is big and messy I haven't tried to investigate how B_x differs from Bin df1and Bin df2
结果包含一个名为 B_x 的列。由于数据集是大而凌乱我没有试图研究如何B_X不同于乙在DF1和乙在DF2
So my question is just in general: what does Pandas mean when it has appended the _x to a column name in the merged dataframe?
所以我的问题是一般性的:当 Pandas 将 _x 附加到合并数据框中的列名时,它是什么意思?
Thank you
谢谢
采纳答案by EdChum
The suffixes are added for any clashes in column names that are not involved in the merge operation, see online docs.
为不涉及合并操作的列名中的任何冲突添加后缀,请参阅在线文档。
So in your case if you think that they are same you could just do the merge on both columns:
因此,在您的情况下,如果您认为它们相同,则可以对两列进行合并:
pd.merge(df1, df2, on=['A', 'B'])
What this will do though is return only the values where A
and B
exist in both dataframes as the default merge type is an inner
merge.
这将只返回两个数据帧中存在A
和B
存在的值,因为默认合并类型是inner
合并。
So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x
/_y
suffix B
columns.
因此,您可以做的是将合并后的 df 大小与第一个进行比较,看看它们是否相同,如果相同,您可以对两列进行合并,或者只是删除/重命名_x
/_y
后缀B
列。
I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outer
merge:
我会花时间确定这些值是否确实相同并且存在于两个数据帧中,在这种情况下,您可能希望执行outer
合并:
pd.merge(df1, df2, on=['A', 'B'], how='outer')
Then what you could do is then drop duplicate rows (and possibly any NaN
rows) and that should give you a clean merged dataframe.
然后你可以做的是删除重复的行(可能还有任何NaN
行),这应该会给你一个干净的合并数据框。
merged_df.drop_duplicates(cols=['A', 'B'],inplace=True)
See online docs for drop_duplicates
请参阅在线文档 drop_duplicates