使用 lambda 条件和 Pandas str.contains 来合并字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42145340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:56:41  来源:igfitidea点击:

Using lambda conditional and pandas str.contains to lump strings

pythonpandaslambdakaggle

提问by hselbie

Trying to learn some stuff, I'm messing around with the global shark attack database on Kaggle and I'm trying to find the best way to lump strings using a lambdafunction and str.contains.

为了学习一些东西,我在 Kaggle 上弄乱了全球鲨鱼攻击数据库,我试图找到使用lambda函数和str.contains.

Basically anywhere a string contains a phrase with skin divinge.g. 'skin diving for abalone', in the data['Activity']column I want to replace the activity with skin diving. (there are 92 variations for skin diving hence trying to use the lambda function)

基本上任何地方的字符串包含一个短语,skin diving例如'skin diving for abalone',在data['Activity']我想用 替换活动的列中skin diving。(皮肤潜水有 92 种变化,因此尝试使用 lambda 函数)

I can return a boolean series using

我可以使用返回一个布尔系列

data['Activity].str.contains('skin diving')

But I'm unsure how to change the value if this condition is true

但如果此条件成立,我不确定如何更改该值

My lambda function = data.apply(lambda x: 'free diving' if x.str.contains('free diving))but i'm getting a syntax error and i'm not familiar enough with lambda functions and pandas to get it right, any help would be appreciated.

我的 lambda 函数 =data.apply(lambda x: 'free diving' if x.str.contains('free diving))但我遇到了语法错误,而且我对 lambda 函数和 Pandas 不够熟悉无法正确使用,任何帮助将不胜感激。

回答by cmaher

Instead of using a Series.str method, you can use the inoperator in your lambda to test for the substring

您可以在 lambda 中使用in运算符来测试子字符串,而不是使用 Series.str 方法

data['activity'] = data['activity'].apply(lambda x: 'skin diving' if 'skin diving' in x else x)

回答by Zero

You could use str.containsmethod with np.where

你可以使用str.contains方法np.where

In [141]: df
Out[141]:
         activity
0  free diving ok
1              ok

In [142]: df.activity = np.where(df.activity.str.contains('free diving'),
                                 'free diving', df.activity)

In [143]: df
Out[143]:
      activity
0  free diving
1           ok