检查 Pandas 列是否包含来自另一列的值

Question

提问by ChrisArmstrong

if df['col']='a','b','c'and df2['col']='a123','b456','d789'how do I create df2['is_contained']='a','b','no_match'where if values from df['col']are found within values from df2['col']the df['col']value is returned and if no match is found, 'no_match' is returned? Also I don't expect there to be multiple matches, but in the unlikely case there are, I'd want to return a string like 'Multiple Matches'.

如果df['col']='a','b','c'和df2['col']='a123','b456','d789'我如何创建df2['is_contained']='a','b','no_match'，如果从值，其中df['col']来自值范围内被发现df2['col']的df['col']返回值，如果没有发现匹配，“NO_MATCH”返回？此外，我不希望有多个匹配项，但在不太可能的情况下，我想返回一个像“多个匹配项”这样的字符串。

Answer 1

回答by hernamesbarbara

With this toy data set, we want to add a new column to df2which will contain no_matchfor the first three rows, and the last row will contain the value 'd'due to the fact that that row's colvalue (the letter 'a') appears in df1.

有了这个玩具数据集，我们想添加一个新列，df2其中将包含no_match前三行，最后一行将包含值，'d'因为该行的col值（字母'a'）出现在 df1 中。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df1 = pd.DataFrame({'col': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col': ['a123','b456','d789', 'a']})

In other words, values from df1should be used to populate this new column in df2only when a row's df2['col']value appears somewhere in df1['col'].

换句话说，从值df1应采用填充这个新列df2只有当行的df2['col']值出现在某处df1['col']。

In [2]: df1
Out[2]:
  col
0   a
1   b
2   c
3   d

In [3]: df2
Out[3]:
    col
0  a123
1  b456
2  d789
3     a

If this is the right way to understand your question, then you can do this with pandas isin:

如果这是理解您的问题的正确方法，那么您可以使用 pandas 执行此操作isin：

In [4]: df2.col.isin(df1.col)
Out[4]:
0    False
1    False
2    False
3     True
Name: col, dtype: bool

This evaluates to Trueonly when a value in df2.colis also in df1.col.

True仅当 in 中的值df2.col也在 in时才计算为df1.col。

Then you can use np.wherewhich is more or less the same as ifelsein R if you are familiar with R at all.

然后，如果您完全熟悉 R，则可以使用与 R 中np.where大致相同的 which ifelse。

In [5]:     np.where(df2.col.isin(df1.col), df1.col, 'NO_MATCH')
Out[5]:
0    NO_MATCH
1    NO_MATCH
2    NO_MATCH
3           d
Name: col, dtype: object

For rows where a df2.colvalue appears in df1.col, the value from df1.colwill be returned for the given row index. In cases where the df2.colvalue is not a member of df1.col, the default 'NO_MATCH'value will be used.

对于df2.col出现在df1.col中的值的行，df1.col将为给定的行索引返回值 from 。如果该df2.col值不是的成员df1.col，'NO_MATCH'则将使用默认值。

Answer 2

回答by Andy Hayden

In 0.13, you can use str.extract:

在 0.13 中，您可以使用str.extract：

In [11]: df1 = pd.DataFrame({'col': ['a', 'b', 'c']})

In [12]: df2 = pd.DataFrame({'col': ['d23','b456','a789']})

In [13]: df2.col.str.extract('(%s)' % '|'.join(df1.col))
Out[13]: 
0    NaN
1      b
2      a
Name: col, dtype: object

检查 Pandas 列是否包含来自另一列的值

提问by ChrisArmstrong

回答by hernamesbarbara

回答by Andy Hayden

相关推荐

最近更新

标签

检查 Pandas 列是否包含来自另一列的值

提问by ChrisArmstrong

回答by hernamesbarbara

回答by Andy Hayden

相关推荐

如何在 IPython pandas 中配置显示输出

pandas 将函数应用于熊猫数据框

在 Pandas 中对大型数据集进行排序

pandas 如何使用所有 xticks 绘制熊猫多索引数据帧

相关推荐

最近更新

标签