如何将 pandas isin 用于多列

Question

提问by Jun Jang

I want to find the values of col1and col2where the col1and col2of the first dataframe are both in the second dataframe.

我想找到的值col1，并col2在col1与col2第一个数据帧的都是在第二个数据帧。

These rows should be in the result dataframe:

这些行应该在结果数据框中：

pizza, boy
pizza, girl
ice cream, boy

披萨，男孩
披萨，女孩
冰淇淋，男孩

because all three rows are in the first and second dataframes.

因为所有三行都在第一个和第二个数据帧中。

How do I possibly accomplish this? I was thinking of using isin, but I am not sure how to use it when I have to consider more than one column.

我怎么可能做到这一点？我正在考虑使用isin，但是当我必须考虑不止一列时，我不确定如何使用它。

Answer 1

回答by unutbu

Perform an inner mergeon col1and col2:

在和上执行内部合并：col1col2

import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))

print(pd.merge(df2.reset_index(), df1, how='inner').set_index('index'))

yields

产量

            col1  col2
index                 
10         pizza   boy
11         pizza  girl
16     ice cream   boy

The purpose of the reset_indexand set_indexcalls are to preserve df2's index as in the desired result you posted. If the index is not important, then

reset_index和set_index调用的目的是在df2您发布的所需结果中保留的索引。如果索引不重要，那么

pd.merge(df2, df1, how='inner')
#         col1  col2
# 0      pizza   boy
# 1      pizza  girl
# 2  ice cream   boy

would suffice.

就足够了。

Alternatively, you could construct MultiIndexsout of the col1and col2columns, and then call the MultiIndex.isinmethod:

或者，您可以从和列中构造MultiIndexs，然后调用方法：col1col2MultiIndex.isin

index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['col1', 'col2']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['col1', 'col2']])
print(df2.loc[index2.isin(index1)])

yields

产量

         col1  col2
10      pizza   boy
11      pizza  girl
16  ice cream   boy

Answer 2

回答by Ningrong Ye

Thank you unutbu! Here is a little update.

谢谢你！这是一个小更新。

import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))
df1[df1.set_index(['col1','col2']).index.isin(df2.set_index(['col1','col2']).index)]

return:

返回：

    col1    col2
1   pizza   boy
4   pizza   girl
5   ice cream   boy

Answer 3

回答by u9628793

If somehow you must stick to isinor the negate version ~isin. You may first create a new column, with the concatenation of col1, col2. Then use isinto filter your data. Here is the code:

如果不知何故你必须坚持isin或否定版本~isin。您可以先创建一个新列，并连接col1, col2。然后用于isin过滤您的数据。这是代码：

import pandas as pd
df1 = pd.DataFrame({'col1': ['pizza', 'hamburger', 'hamburger', 'pizza', 'ice cream'], 'col2': ['boy', 'boy', 'girl', 'girl', 'boy']}, index=range(1,6))
df2 = pd.DataFrame({'col1': ['pizza', 'pizza', 'chicken', 'cake', 'cake', 'chicken', 'ice cream'], 'col2': ['boy', 'girl', 'girl', 'boy', 'girl', 'boy', 'boy']}, index=range(10,17))

df1['indicator'] = df1['col1'].str.cat(df1['col2'])
df2['indicator'] = df2['col1'].str.cat(df2['col2'])

df2.loc[df2['indicator'].isin(df1['indicator'])].drop(columns=['indicator'])

which gives

这使


    col1    col2
10  pizza   boy
11  pizza   girl
16  ice cream   boy

如何将 pandas isin 用于多列

提问by Jun Jang

回答by unutbu

回答by Ningrong Ye

回答by u9628793

相关推荐

最近更新

标签

如何将 pandas isin 用于多列

提问by Jun Jang

回答by unutbu

回答by Ningrong Ye

回答by u9628793

相关推荐

pandas 根据列值的长度过滤数据框行

pandas 大量数据的散点图

Pandas：使用 read_csv 解析不同列中的日期

pandas 重命名没有列名的熊猫数据框的列

相关推荐

最近更新

标签