Python 按包含 str 过滤熊猫数据框行

Question

提问by David

I have a python pandas dataframe dfwith a lot of rows. From those rows, I want to slice out and only use the rows that contain the word 'ball' in the 'body' column. To do that, I can do:

我有一个df包含很多行的 python pandas 数据框。从这些行中，我想切出并仅使用“body”列中包含“ball”一词的行。为此，我可以这样做：

df[df['body'].str.contains('ball')]

The issue is, I want it to be case insensitive, meaning that if the word Ball or bAll showed up, I'll want those as well. One way to do case insensitive search is to turn the string to lowercase and then search that way. I'm wondering how to go about doing that. I tried

问题是，我希望它不区分大小写，这意味着如果出现 Ball 或 bAll 一词，我也会想要它们。进行不区分大小写搜索的一种方法是将字符串转换为小写，然后以这种方式进行搜索。我想知道如何去做。我试过

df[df['body'].str.lower().contains('ball')]

But that doesn't work. I'm not sure if I'm supposed to use a lambda function on this or something of that nature.

但这不起作用。我不确定我是否应该在这种或那种性质的东西上使用 lambda 函数。

Answer 1

采纳答案by DSM

You could either use .stragain to get access to the string methods, or (better, IMHO) use case=Falseto guarantee case insensitivity:

您可以.str再次使用来访问字符串方法，或者（更好，恕我直言）使用case=False来保证不区分大小写：

>>> df = pd.DataFrame({"body": ["ball", "red BALL", "round sphere"]})
>>> df[df["body"].str.contains("ball")]
   body
0  ball
>>> df[df["body"].str.lower().str.contains("ball")]
       body
0      ball
1  red BALL
>>> df[df["body"].str.contains("ball", case=False)]
       body
0      ball
1  red BALL
>>> df[df["body"].str.contains("ball", case=True)]
   body
0  ball

(Note that if you're going to be doing assignments, it's a better habit to use df.loc, to avoid the dreaded SettingWithCopyWarning, but if we're just selecting here it doesn't matter.)

（请注意，如果您要进行作业，使用是一个更好的习惯df.loc，以避免可怕的 SettingWithCopyWarning，但如果我们只是在此处进行选择，则无所谓。）

(Note #2: guess I really didn't need to specify 'round' there..)

（注意 #2：我猜我真的不需要在那里指定“圆形”......）

Python 按包含 str 过滤熊猫数据框行

提问by David

采纳答案by DSM

相关推荐

最近更新

标签

Python 按包含 str 过滤熊猫数据框行

提问by David

采纳答案by DSM

相关推荐

从 IPython Notebook 中的日志记录模块获取输出

Python Tkinter：固定尺寸框架中的中心标签？

Python 您可以在 matplotlib 中绘制实时数据吗？

Python 如何在 Pandas DataFrame 中一次获取多列的值计数？

相关推荐

最近更新

标签