pandas 根据另一个数据框的列值过滤数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50655370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:38:06  来源:igfitidea点击:

Filtering the dataframe based on the column value of another dataframe

pythonpandasfiltermerge

提问by Moses Soleman

I have 2 dataframes

我有 2 个数据框

df1

df1

Company           SKU   Sales
Walmart           A     100
Total             A     200
Walmart           B     200
Total             B     300
Walmart           C     400
Walmart           D     500

df2

df2

 Company             SKU   Sales
 Walmart             A     400
 Total               B     300
 Walmart             C     900
 Walmart             F     400
 Total               G     500

I want a resulting dataframe (df2) which only has the records of matching SKUs in df1 and df2

我想要一个结果数据框 (df2),它只包含 df1 和 df2 中匹配 SKU 的记录

df2

df2

Company       SKU   Sales 
Walmart       A     400
Total         B     300
Walmart       C     900

I want only the unique (Company + SKU) values of df1 in df2

我只想要 df2 中 df1 的唯一(公司 + SKU)值

Is there any good solution to achieve this?

有没有什么好的解决方案来实现这一目标?

回答by Anton vBR

Update

更新

You could use a simple mask:

您可以使用一个简单的掩码:

m = df2.SKU.isin(df1.SKU)
df2 = df2[m]


You are looking for an inner join. Try this:

您正在寻找内部连接。尝试这个:

df3 = df1.merge(df2, on=['SKU','Sales'], how='inner')

#  SKU  Sales
#0   A    100
#1   B    200
#2   C    300

Or this:

或这个:

df3 = df1.merge(df2, on='SKU', how='inner')

#  SKU  Sales_x  Sales_y
#0   A      100      100
#1   B      200      200
#2   C      300      300

回答by Ram

Solution 1 :

解决方案1:

# First identify the common SKU's    
temp = list(set(list(df1.SKU)).intersection(set(list(df2.SKU))))

# Filter df2 using the list of common SKU's
df3 = df2[df2.SKU.isin(temp)]
print(df3)

   SKU  Sales
0   A   400
1   B   300
2   C   900

Solution 2 : One Line solution

解决方案 2:单线解决方案

df3 = df2[df2.SKU.isin(list(df1.SKU))]

EDIT 1 : Solution for the updated question (Not the optimal way of doing it, but answers your question)

编辑 1:更新问题的解决方案(不是最佳方式,但可以回答您的问题)

# reading data for df1
df1= pd.read_clipboard(sep='\s+')
df1
    Company SKU Sales
0   Walmart A   100
1   Total   A   200
2   Walmart B   200
3   Total   B   300
4   Walmart C   400
5   Walmart D   500

# reading data for df2
df2= pd.read_clipboard(sep='\s+')
df2
Company SKU Sales
0   Walmart A   400
1   Total   B   300
2   Walmart C   900
3   Walmart F   400
4   Total   G   500

# Using intersect and zip to create a list of tuples matching in the data frames
temp = list(set(list(zip(df1.Company,df1.SKU))).intersection(set(list(zip(df2.Company,df2.SKU)))))
temp
[('Walmart', 'A'), ('Walmart', 'C'), ('Total', 'B')]

# Creating a helper variable in df2 to lookup in the temp list
df2["temp"] = list(zip(df2.Company,df2.SKU))
df2= df2[df2["temp"].isin(temp)]
del(df2["temp"])
df2
    Company SKU Sales
0   Walmart A   400
1   Total   B   300
2   Walmart C   900

Suggestions are welcome to improve this code

欢迎提出改进此代码的建议

回答by jpp

One way is to align indices and then use a mask.

一种方法是对齐索引,然后使用掩码。

# align indices
df1 = df1.set_index(['Company',  'SKU'])
df2 = df2.set_index(['Company',  'SKU'])

# calculate & apply mask
df2 = df2[df2.index.isin(df1.index)].reset_index()

Resetting index is not required, but needed to elevate Companyand SKUto columns.

不需要重置索引,但需要提升CompanySKU列。