如何在 Pandas 数据框列中搜索特定文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46516275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to search for specific text within a Pandas dataframe column?
提问by Dom B
I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.
我想识别包含特定列文本的 Pandas csv 文件中的所有实例,在本例中为“注释”列,其中提到了“练习”一词。一旦在“Notes”列中识别出包含“excercise”关键字的行,我想创建一个名为“ExcerciseDay”的新列,如果满足“excercise”条件则为 1,否则为 0 . 我遇到了麻烦,因为文本可以在“注释”列中包含长字符串值(即“锻炼、早晨锻炼、消耗的酒精、消耗的咖啡”),我仍然希望它识别“锻炼”,即使它在更长的时间内细绳。
I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.
我尝试了下面的功能,以识别“注释”列中包含“锻炼”一词的所有文本。当我使用这个函数时没有选择任何行,我知道这可能是因为 * 运算符,但我想显示逻辑。可能有一种更有效的方法来做到这一点,但我对编程和 python 仍然比较陌生。
def IdentifyExercise(row):
if row['Notes'] == '*exercise*':
return 1
elif row['Notes'] != '*exercise*':
return 0
JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1)
回答by jezrael
Convert boolean Series created by str.contains
to int
by astype
:
转换布尔系列创建人str.contains
到int
由astype
:
JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)
For not case sensitive:
对于不区分大小写:
JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
.astype(int)
回答by cs95
You can also use np.where
:
您还可以使用np.where
:
JoinedTables['ExerciseDay'] = \
np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)
回答by JoseleMG
Another way would be:
另一种方法是:
JoinedTables['ExerciseDay'] =[1 if "exercise" in x else 0 for x in JoinedTables['Notes']]
(Probably not the fastest solution)
(可能不是最快的解决方案)