可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗？

Question

提问by Quant

Can one perform a left join in pandas that selects only the first match on the right? Example:

可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗？例子：

left            = pd.DataFrame()
left['age']     = [11, 12]
right           = pd.DataFrame()
right['age']    = [10, 11, 11]
right['salary'] = [ 100, 150, 200 ]
left.merge( right, how='left', on='age' )

Returns

退货

   age  salary
0   11     150
1   11     200
2   12     NaN

But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:

但我想要的是保留左边的行数，仅通过第一场比赛。那是：

   age  salary
0   11     150
2   12     NaN

So I've been using

所以我一直在使用

left.merge( right.drop_duplicates(['age']), how='left', on='age')

but I believe this makes a full copy of right. And it smells funny.

但我相信这是一个完整的权利副本。它闻起来很有趣。

Is there a more elegant way?

有没有更优雅的方式？

Answer 1

回答by samus

Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:

是的，您可以使用 groupby 删除重复的行。做你所做的一切来定义左和右。现在，我在你的最后一行定义了一个新的数据框：

left2=left.merge( right, how='left', on='age' )
df= left2.groupby(['age'])['salary'].first().reset_index()
df

At first I used a .min(), which will give you the minimum salary at each age, as such:

起初我使用了一个 .min()，它会给你每个年龄的最低工资，例如：

df= left2.groupby(['age'])['salary'].min().reset_index()

But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.

但是您特别询问了第一场比赛。为此，您可以使用 .first() 选项。注意：最后的 .reset_index() 只是将 groupby 的输出重新格式化为数据帧。

可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗？

提问by Quant

回答by samus

相关推荐

最近更新

标签

可以在 Pandas 中执行只选择右侧第一个匹配项的左连接吗？

提问by Quant

回答by samus

相关推荐

pandas 熊猫将一行除以另一行并输出到同一数据帧中的另一行

pandas 按组标准化 DataFrame

pandas 迭代行并扩展熊猫数据框

pandas 如何打开此 XML 文件以在 Python 中创建数据框？

相关推荐

最近更新

标签