Python 按包含 str 过滤熊猫数据框行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32616261/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering pandas dataframe rows by contains str
提问by David
I have a python pandas dataframe df
with a lot of rows. From those rows, I want to slice out and only use the rows that contain the word 'ball' in the 'body' column. To do that, I can do:
我有一个df
包含很多行的 python pandas 数据框。从这些行中,我想切出并仅使用“body”列中包含“ball”一词的行。为此,我可以这样做:
df[df['body'].str.contains('ball')]
df[df['body'].str.contains('ball')]
The issue is, I want it to be case insensitive, meaning that if the word Ball or bAll showed up, I'll want those as well. One way to do case insensitive search is to turn the string to lowercase and then search that way. I'm wondering how to go about doing that. I tried
问题是,我希望它不区分大小写,这意味着如果出现 Ball 或 bAll 一词,我也会想要它们。进行不区分大小写搜索的一种方法是将字符串转换为小写,然后以这种方式进行搜索。我想知道如何去做。我试过
df[df['body'].str.lower().contains('ball')]
df[df['body'].str.lower().contains('ball')]
But that doesn't work. I'm not sure if I'm supposed to use a lambda function on this or something of that nature.
但这不起作用。我不确定我是否应该在这种或那种性质的东西上使用 lambda 函数。
采纳答案by DSM
You could either use .str
again to get access to the string methods, or (better, IMHO) use case=False
to guarantee case insensitivity:
您可以.str
再次使用来访问字符串方法,或者(更好,恕我直言)使用case=False
来保证不区分大小写:
>>> df = pd.DataFrame({"body": ["ball", "red BALL", "round sphere"]})
>>> df[df["body"].str.contains("ball")]
body
0 ball
>>> df[df["body"].str.lower().str.contains("ball")]
body
0 ball
1 red BALL
>>> df[df["body"].str.contains("ball", case=False)]
body
0 ball
1 red BALL
>>> df[df["body"].str.contains("ball", case=True)]
body
0 ball
(Note that if you're going to be doing assignments, it's a better habit to use df.loc
, to avoid the dreaded SettingWithCopyWarning, but if we're just selecting here it doesn't matter.)
(请注意,如果您要进行作业,使用 是一个更好的习惯df.loc
,以避免可怕的 SettingWithCopyWarning,但如果我们只是在此处进行选择,则无所谓。)
(Note #2: guess I really didn't need to specify 'round' there..)
(注意 #2:我猜我真的不需要在那里指定“圆形”......)