Python 按包含 str 过滤熊猫数据框行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32616261/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:55:31  来源:igfitidea点击:

Filtering pandas dataframe rows by contains str

pythonstringpandas

提问by David

I have a python pandas dataframe dfwith a lot of rows. From those rows, I want to slice out and only use the rows that contain the word 'ball' in the 'body' column. To do that, I can do:

我有一个df包含很多行的 python pandas 数据框。从这些行中,我想切出并仅使用“body”列中包含“ball”一词的行。为此,我可以这样做:

df[df['body'].str.contains('ball')]

df[df['body'].str.contains('ball')]

The issue is, I want it to be case insensitive, meaning that if the word Ball or bAll showed up, I'll want those as well. One way to do case insensitive search is to turn the string to lowercase and then search that way. I'm wondering how to go about doing that. I tried

问题是,我希望它不区分大小写,这意味着如果出现 Ball 或 bAll 一词,我也会想要它们。进行不区分大小写搜索的一种方法是将字符串转换为小写,然后以这种方式进行搜索。我想知道如何去做。我试过

df[df['body'].str.lower().contains('ball')]

df[df['body'].str.lower().contains('ball')]

But that doesn't work. I'm not sure if I'm supposed to use a lambda function on this or something of that nature.

但这不起作用。我不确定我是否应该在这种或那种性质的东西上使用 lambda 函数。

采纳答案by DSM

You could either use .stragain to get access to the string methods, or (better, IMHO) use case=Falseto guarantee case insensitivity:

您可以.str再次使用来访问字符串方法,或者(更好,恕我直言)使用case=False来保证不区分大小写:

>>> df = pd.DataFrame({"body": ["ball", "red BALL", "round sphere"]})
>>> df[df["body"].str.contains("ball")]
   body
0  ball
>>> df[df["body"].str.lower().str.contains("ball")]
       body
0      ball
1  red BALL
>>> df[df["body"].str.contains("ball", case=False)]
       body
0      ball
1  red BALL
>>> df[df["body"].str.contains("ball", case=True)]
   body
0  ball

(Note that if you're going to be doing assignments, it's a better habit to use df.loc, to avoid the dreaded SettingWithCopyWarning, but if we're just selecting here it doesn't matter.)

(请注意,如果您要进行作业,使用 是一个更好的习惯df.loc,以避免可怕的 SettingWithCopyWarning,但如果我们只是在此处进行选择,则无所谓。)

(Note #2: guess I really didn't need to specify 'round' there..)

(注意 #2:我猜我真的不需要在那里指定“圆形”......)