Pandas 系列不区分大小写匹配和值之间的部分匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44979927/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:56:55  来源:igfitidea点击:

Pandas series case-insensitive matching and partial matching between values

pythonpandasnumpynp

提问by user3535074

I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:

我有以下操作来添加一个状态,显示一个数据帧列的列中的任何字符串出现在另一个数据帧的指定列中。它看起来像这样:

df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')

This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?

如果字符串大小写不同,这将不匹配。是否可以在不区分大小写的情况下执行此操作?

Also, is it possible return 'Matched' when a value in df_one.Aends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'

此外,是否有可能回归“匹配”时的值df_one.A从全字符串结尾df_two.A?例如 df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'

回答by cmaher

You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):

您可以通过在表达式中将两个字符串转换为小写或大写(任何一种都有效)来进行第一个测试(因为您没有将任一列重新分配回您的 DataFrame,大小写转换只是暂时的):

df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \ 
                            'Matched', 'Unmatched')

You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):

您可以通过检查 df_one.A 中的每个字符串是否以 df_two.A 中的任何字符串结尾来执行第二个测试,如下所示(假设您仍然需要不区分大小写的匹配):

df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
                                      lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \ 
                                      'Matched', 'Unmatched')