如何在 Pandas 数据框列中搜索特定文本？

Question

提问by Dom B

I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.

我想识别包含特定列文本的 Pandas csv 文件中的所有实例，在本例中为“注释”列，其中提到了“练习”一词。一旦在“Notes”列中识别出包含“excercise”关键字的行，我想创建一个名为“ExcerciseDay”的新列，如果满足“excercise”条件则为 1，否则为 0 . 我遇到了麻烦，因为文本可以在“注释”列中包含长字符串值（即“锻炼、早晨锻炼、消耗的酒精、消耗的咖啡”），我仍然希望它识别“锻炼”，即使它在更长的时间内细绳。

I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.

我尝试了下面的功能，以识别“注释”列中包含“锻炼”一词的所有文本。当我使用这个函数时没有选择任何行，我知道这可能是因为 * 运算符，但我想显示逻辑。可能有一种更有效的方法来做到这一点，但我对编程和 python 仍然比较陌生。

def IdentifyExercise(row):
    if row['Notes'] == '*exercise*':
        return 1
    elif row['Notes'] != '*exercise*':
        return 0


JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1)

Answer 1

回答by jezrael

Convert boolean Series created by str.containsto intby astype:

转换布尔系列创建人str.contains到int由astype：

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)

For not case sensitive:

对于不区分大小写：

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
                                                   .astype(int)

Answer 2

回答by cs95

You can also use np.where:

您还可以使用np.where：

JoinedTables['ExerciseDay'] = \
    np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)

Answer 3

回答by JoseleMG

Another way would be:

另一种方法是：

JoinedTables['ExerciseDay'] =[1 if "exercise" in x  else 0 for x in JoinedTables['Notes']]

(Probably not the fastest solution)

（可能不是最快的解决方案）

如何在 Pandas 数据框列中搜索特定文本？

提问by Dom B

回答by jezrael

回答by cs95

回答by JoseleMG

相关推荐

最近更新

标签

如何在 Pandas 数据框列中搜索特定文本？

提问by Dom B

回答by jezrael

回答by cs95

回答by JoseleMG

相关推荐

pandas ZeroDivisionError：浮点除以零（python 3.6）

使用正则表达式在 Pandas 数据框中创建新列

pandas 读取 csv 文件的一部分

python:pandas - 如何将熊猫数据帧的前两行组合到数据帧标题？

相关推荐

最近更新

标签