检查 Pandas 列是否包含来自另一列的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21514191/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:39:24  来源:igfitidea点击:

Check if Pandas column contains value from another column

pythonpandas

提问by ChrisArmstrong

if df['col']='a','b','c'and df2['col']='a123','b456','d789'how do I create df2['is_contained']='a','b','no_match'where if values from df['col']are found within values from df2['col']the df['col']value is returned and if no match is found, 'no_match' is returned? Also I don't expect there to be multiple matches, but in the unlikely case there are, I'd want to return a string like 'Multiple Matches'.

如果df['col']='a','b','c'df2['col']='a123','b456','d789'我如何创建df2['is_contained']='a','b','no_match',如果从值,其中df['col']来自值范围内被发现df2['col']df['col']返回值,如果没有发现匹配,“NO_MATCH”返回?此外,我不希望有多个匹配项,但在不太可能的情况下,我想返回一个像“多个匹配项”这样的字符串。

回答by hernamesbarbara

With this toy data set, we want to add a new column to df2which will contain no_matchfor the first three rows, and the last row will contain the value 'd'due to the fact that that row's colvalue (the letter 'a') appears in df1.

有了这个玩具数据集,我们想添加一个新列,df2其中将包含no_match前三行,最后一行将包含值,'d'因为该行的col值(字母'a')出现在 df1 中。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df1 = pd.DataFrame({'col': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col': ['a123','b456','d789', 'a']})

In other words, values from df1should be used to populate this new column in df2only when a row's df2['col']value appears somewhere in df1['col'].

换句话说,从值df1应采用填充这个新列df2只有当行的df2['col']值出现在某处df1['col']

In [2]: df1
Out[2]:
  col
0   a
1   b
2   c
3   d

In [3]: df2
Out[3]:
    col
0  a123
1  b456
2  d789
3     a

If this is the right way to understand your question, then you can do this with pandas isin:

如果这是理解您的问题的正确方法,那么您可以使用 pandas 执行此操作isin

In [4]: df2.col.isin(df1.col)
Out[4]:
0    False
1    False
2    False
3     True
Name: col, dtype: bool

This evaluates to Trueonly when a value in df2.colis also in df1.col.

True仅当 in 中的值df2.col也在 in时才计算为df1.col

Then you can use np.wherewhich is more or less the same as ifelsein R if you are familiar with R at all.

然后,如果您完全熟悉 R,则可以使用与 R 中np.where大致相同的 which ifelse

In [5]:     np.where(df2.col.isin(df1.col), df1.col, 'NO_MATCH')
Out[5]:
0    NO_MATCH
1    NO_MATCH
2    NO_MATCH
3           d
Name: col, dtype: object

For rows where a df2.colvalue appears in df1.col, the value from df1.colwill be returned for the given row index. In cases where the df2.colvalue is not a member of df1.col, the default 'NO_MATCH'value will be used.

对于df2.col出现在df1.col中的值的行,df1.col将为给定的行索引返回值 from 。如果该df2.col值不是 的成员df1.col'NO_MATCH'则将使用默认值。

回答by Andy Hayden

In 0.13, you can use str.extract:

在 0.13 中,您可以使用str.extract

In [11]: df1 = pd.DataFrame({'col': ['a', 'b', 'c']})

In [12]: df2 = pd.DataFrame({'col': ['d23','b456','a789']})

In [13]: df2.col.str.extract('(%s)' % '|'.join(df1.col))
Out[13]: 
0    NaN
1      b
2      a
Name: col, dtype: object