可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26258996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:33:33  来源:igfitidea点击:

Can one perform a left join in pandas that selects only the first match on the right?

pythonpandasleft-join

提问by Quant

Can one perform a left join in pandas that selects only the first match on the right? Example:

可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗?例子:

left            = pd.DataFrame()
left['age']     = [11, 12]
right           = pd.DataFrame()
right['age']    = [10, 11, 11]
right['salary'] = [ 100, 150, 200 ]
left.merge( right, how='left', on='age' )

Returns

退货

   age  salary
0   11     150
1   11     200
2   12     NaN

But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:

但我想要的是保留左边的行数,仅通过第一场比赛。那是:

   age  salary
0   11     150
2   12     NaN

So I've been using

所以我一直在使用

left.merge( right.drop_duplicates(['age']), how='left', on='age')

but I believe this makes a full copy of right. And it smells funny.

但我相信这是一个完整的权利副本。它闻起来很有趣。

Is there a more elegant way?

有没有更优雅的方式?

回答by samus

Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:

是的,您可以使用 groupby 删除重复的行。做你所做的一切来定义左和右。现在,我在你的最后一行定义了一个新的数据框:

left2=left.merge( right, how='left', on='age' )
df= left2.groupby(['age'])['salary'].first().reset_index()
df

At first I used a .min(), which will give you the minimum salary at each age, as such:

起初我使用了一个 .min(),它会给你每个年龄的最低工资,例如:

df= left2.groupby(['age'])['salary'].min().reset_index()

But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.

但是您特别询问了第一场比赛。为此,您可以使用 .first() 选项。注意:最后的 .reset_index() 只是将 groupby 的输出重新格式化为数据帧。