Pandas 系列不区分大小写匹配和值之间的部分匹配

Question

提问by user3535074

I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:

我有以下操作来添加一个状态，显示一个数据帧列的列中的任何字符串出现在另一个数据帧的指定列中。它看起来像这样：

df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')

This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?

如果字符串大小写不同，这将不匹配。是否可以在不区分大小写的情况下执行此操作？

Also, is it possible return 'Matched' when a value in df_one.Aends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'

此外，是否有可能回归“匹配”时的值df_one.A从全字符串结尾df_two.A？例如 df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'

Answer 1

回答by cmaher

You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):

您可以通过在表达式中将两个字符串转换为小写或大写（任何一种都有效）来进行第一个测试（因为您没有将任一列重新分配回您的 DataFrame，大小写转换只是暂时的）：

df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \ 
                            'Matched', 'Unmatched')

You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):

您可以通过检查 df_one.A 中的每个字符串是否以 df_two.A 中的任何字符串结尾来执行第二个测试，如下所示（假设您仍然需要不区分大小写的匹配）：

df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
                                      lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \ 
                                      'Matched', 'Unmatched')

Pandas 系列不区分大小写匹配和值之间的部分匹配

提问by user3535074

回答by cmaher

相关推荐

最近更新

标签

Pandas 系列不区分大小写匹配和值之间的部分匹配

提问by user3535074

回答by cmaher

相关推荐

pandas 使用 sklearn 在 3 维上进行 K 均值聚类

pandas 拉取 MS 访问表并将它们放入 python 中的数据框中

pandas 将多个 csv 文件连接成具有相同标头的单个 csv - Python

从 Pandas 数据帧创建 numpy 数组

相关推荐

最近更新

标签