pandas 根据另一个数据框的列值过滤数据框

Question

提问by Moses Soleman

I have 2 dataframes

我有 2 个数据框

df1

Company           SKU   Sales
Walmart           A     100
Total             A     200
Walmart           B     200
Total             B     300
Walmart           C     400
Walmart           D     500

df2

 Company             SKU   Sales
 Walmart             A     400
 Total               B     300
 Walmart             C     900
 Walmart             F     400
 Total               G     500

I want a resulting dataframe (df2) which only has the records of matching SKUs in df1 and df2

我想要一个结果数据框 (df2)，它只包含 df1 和 df2 中匹配 SKU 的记录

df2

Company       SKU   Sales 
Walmart       A     400
Total         B     300
Walmart       C     900

I want only the unique (Company + SKU) values of df1 in df2

我只想要 df2 中 df1 的唯一（公司 + SKU）值

Is there any good solution to achieve this?

有没有什么好的解决方案来实现这一目标？

Answer 1

回答by Anton vBR

Update

更新

You could use a simple mask:

您可以使用一个简单的掩码：

m = df2.SKU.isin(df1.SKU)
df2 = df2[m]

You are looking for an inner join. Try this:

您正在寻找内部连接。尝试这个：

df3 = df1.merge(df2, on=['SKU','Sales'], how='inner')

#  SKU  Sales
#0   A    100
#1   B    200
#2   C    300

Or this:

或这个：

df3 = df1.merge(df2, on='SKU', how='inner')

#  SKU  Sales_x  Sales_y
#0   A      100      100
#1   B      200      200
#2   C      300      300

Answer 2

回答by Ram

Solution 1 :

解决方案1：

# First identify the common SKU's    
temp = list(set(list(df1.SKU)).intersection(set(list(df2.SKU))))

# Filter df2 using the list of common SKU's
df3 = df2[df2.SKU.isin(temp)]
print(df3)

   SKU  Sales
0   A   400
1   B   300
2   C   900

Solution 2 : One Line solution

解决方案 2：单线解决方案

df3 = df2[df2.SKU.isin(list(df1.SKU))]

EDIT 1 : Solution for the updated question (Not the optimal way of doing it, but answers your question)

编辑 1：更新问题的解决方案（不是最佳方式，但可以回答您的问题）

# reading data for df1
df1= pd.read_clipboard(sep='\s+')
df1
    Company SKU Sales
0   Walmart A   100
1   Total   A   200
2   Walmart B   200
3   Total   B   300
4   Walmart C   400
5   Walmart D   500

# reading data for df2
df2= pd.read_clipboard(sep='\s+')
df2
Company SKU Sales
0   Walmart A   400
1   Total   B   300
2   Walmart C   900
3   Walmart F   400
4   Total   G   500

# Using intersect and zip to create a list of tuples matching in the data frames
temp = list(set(list(zip(df1.Company,df1.SKU))).intersection(set(list(zip(df2.Company,df2.SKU)))))
temp
[('Walmart', 'A'), ('Walmart', 'C'), ('Total', 'B')]

# Creating a helper variable in df2 to lookup in the temp list
df2["temp"] = list(zip(df2.Company,df2.SKU))
df2= df2[df2["temp"].isin(temp)]
del(df2["temp"])
df2
    Company SKU Sales
0   Walmart A   400
1   Total   B   300
2   Walmart C   900

Suggestions are welcome to improve this code

欢迎提出改进此代码的建议

Answer 3

回答by jpp

One way is to align indices and then use a mask.

一种方法是对齐索引，然后使用掩码。

# align indices
df1 = df1.set_index(['Company',  'SKU'])
df2 = df2.set_index(['Company',  'SKU'])

# calculate & apply mask
df2 = df2[df2.index.isin(df1.index)].reset_index()

Resetting index is not required, but needed to elevate Companyand SKUto columns.

不需要重置索引，但需要提升Company和SKU列。

pandas 根据另一个数据框的列值过滤数据框

提问by Moses Soleman

回答by Anton vBR

回答by Ram

回答by jpp

相关推荐

最近更新

标签

pandas 根据另一个数据框的列值过滤数据框

提问by Moses Soleman

回答by Anton vBR

回答by Ram

回答by jpp

相关推荐

使用 Pandas 快速去除标点符号

pandas 类型错误：“系列”对象是可变的，因此它们不能被列散列问题

pandas 如何在 Python 中将日期转换为季度？

pandas 如何使用 IQR 从 DataFrame 中删除异常值？

相关推荐

最近更新

标签