可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26258996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can one perform a left join in pandas that selects only the first match on the right?
提问by Quant
Can one perform a left join in pandas that selects only the first match on the right? Example:
可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗?例子:
left = pd.DataFrame()
left['age'] = [11, 12]
right = pd.DataFrame()
right['age'] = [10, 11, 11]
right['salary'] = [ 100, 150, 200 ]
left.merge( right, how='left', on='age' )
Returns
退货
age salary
0 11 150
1 11 200
2 12 NaN
But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:
但我想要的是保留左边的行数,仅通过第一场比赛。那是:
age salary
0 11 150
2 12 NaN
So I've been using
所以我一直在使用
left.merge( right.drop_duplicates(['age']), how='left', on='age')
but I believe this makes a full copy of right. And it smells funny.
但我相信这是一个完整的权利副本。它闻起来很有趣。
Is there a more elegant way?
有没有更优雅的方式?
回答by samus
Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:
是的,您可以使用 groupby 删除重复的行。做你所做的一切来定义左和右。现在,我在你的最后一行定义了一个新的数据框:
left2=left.merge( right, how='left', on='age' )
df= left2.groupby(['age'])['salary'].first().reset_index()
df
At first I used a .min(), which will give you the minimum salary at each age, as such:
起初我使用了一个 .min(),它会给你每个年龄的最低工资,例如:
df= left2.groupby(['age'])['salary'].min().reset_index()
But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.
但是您特别询问了第一场比赛。为此,您可以使用 .first() 选项。注意:最后的 .reset_index() 只是将 groupby 的输出重新格式化为数据帧。

