如何将 pandas isin 用于多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45198786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to use pandas isin for multiple columns
提问by Jun Jang
I want to find the values of col1
and col2
where the col1
and col2
of the first dataframe are both in the second dataframe.
我想找到的值col1
,并col2
在col1
与col2
第一个数据帧的都是在第二个数据帧。
These rows should be in the result dataframe:
这些行应该在结果数据框中:
pizza, boy
pizza, girl
ice cream, boy
披萨,男孩
披萨,女孩
冰淇淋,男孩
because all three rows are in the first and second dataframes.
因为所有三行都在第一个和第二个数据帧中。
How do I possibly accomplish this? I was thinking of using isin
, but I am not sure how to use it when I have to consider more than one column.
我怎么可能做到这一点?我正在考虑使用isin
,但是当我必须考虑不止一列时,我不确定如何使用它。
回答by unutbu
Perform an inner mergeon col1
and col2
:
在和上执行内部合并:col1
col2
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))
yields
产量
col1 col2
index
10 pizza boy
11 pizza girl
16 ice cream boy
The purpose of the reset_index
and set_index
calls are to preserve df2
's index as in the desired result you posted. If the index is not important, then
reset_index
和set_index
调用的目的是在df2
您发布的所需结果中保留的索引。如果索引不重要,那么
pd.merge(df2, df1, how='inner')
# col1 col2
# 0 pizza boy
# 1 pizza girl
# 2 ice cream boy
would suffice.
就足够了。
Alternatively, you could construct MultiIndex
sout of the col1
and col2
columns, and then call the MultiIndex.isin
method:
或者,您可以从和列中构造MultiIndex
s,然后调用方法:col1
col2
MultiIndex.isin
index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
print(df2.loc[index2.isin(index1)])
yields
产量
col1 col2
10 pizza boy
11 pizza girl
16 ice cream boy
回答by Ningrong Ye
Thank you unutbu! Here is a little update.
谢谢你!这是一个小更新。
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
df1[df1.set_index(['col1','col2']).index.isin(df2.set_index(['col1','col2']).index)]
return:
返回:
col1 col2
1 pizza boy
4 pizza girl
5 ice cream boy
回答by u9628793
If somehow you must stick to isin
or the negate version ~isin
.
You may first create a new column, with the concatenation of col1
, col2
. Then use isin
to filter your data. Here is the code:
如果不知何故你必须坚持isin
或否定版本~isin
。您可以先创建一个新列,并连接col1
, col2
。然后用于isin
过滤您的数据。这是代码:
import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
df1['indicator'] = df1['col1'].str.cat(df1['col2'])
df2['indicator'] = df2['col1'].str.cat(df2['col2'])
df2.loc[df2['indicator'].isin(df1['indicator'])].drop(columns=['indicator'])
which gives
这使
col1 col2
10 pizza boy
11 pizza girl
16 ice cream boy