Python 如何在 Pandas 数据帧中将 str.contains() 与多个表达式一起使用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19169649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:03:18  来源:igfitidea点击:

How to use str.contains() with multiple expressions, in pandas dataframes?

pythonstringperformancepandasdataframe

提问by M.A.Kline

I'm wondering if there is a more efficient way to use the str.contains() function in Pandas, to search for two partial strings at once. I want to search a given column in a dataframe for data that contains either "nt" or "nv". Right now, my code looks like this:

我想知道是否有更有效的方法来使用 Pandas 中的 str.contains() 函数来一次搜索两个部分字符串。我想在数据框中的给定列中搜索包含“nt”或“nv”的数据。现在,我的代码如下所示:

    df[df['Behavior'].str.contains("nt", na=False)]
    df[df['Behavior'].str.contains("nv", na=False)]

And then I append one result to another. What I'd like to do is use a single line of code to search for any data that includes "nt" OR "nv" OR "nf." I've played around with some ways that I thought should work, including just sticking a pipe between terms, but all of these result in errors. I've checked the documentation, but I don't see this as an option. I get errors like this:

然后我将一个结果附加到另一个结果。我想要做的是使用一行代码来搜索包含“nt”或“nv”或“nf”的任何数据。我尝试了一些我认为应该起作用的方法,包括在术语之间插入管道,但所有这些都会导致错误。我已经检查了文档,但我不认为这是一个选项。我收到这样的错误:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-113-1d11e906812c> in <module>()
    3 
    4 
    ----> 5 soctol = f_recs[f_recs['Behavior'].str.contains("nt"|"nv", na=False)]
    6 soctol

    TypeError: unsupported operand type(s) for |: 'str' and 'str'

Is there a fast way to do this? Thanks for any help, I am a beginner but am LOVING pandas for data wrangling.

有没有快速的方法来做到这一点?感谢您的帮助,我是一个初学者,但我很喜欢用熊猫来处理数据。

采纳答案by Andy Hayden

The is one regular expression and should be in one string:

这是一个正则表达式,应该在一个字符串中:

"nt|nv"  # rather than "nt" | " nv"
f_recs[f_recs['Behavior'].str.contains("nt|nv", na=False)]

Python doesn't let you use the or (|) operator on strings:

Python 不允许您|在字符串上使用 or ( ) 运算符:

In [1]: "nt" | "nv"
TypeError: unsupported operand type(s) for |: 'str' and 'str'

回答by Muhammad Hilmi

I try this one and it's work:

我试试这个,它的工作:

df[df['Behavior'].str.contains('nt|nv', na=False)]