在 Pandas 数据框中选择任何列包含字符串的行的最简洁方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38980514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:50:06  来源:igfitidea点击:

Most concise way to select rows where any column contains a string in Pandas dataframe?

pythonpandas

提问by Reason

What is the most concise way to select all rows where any column contains a string in a Pandas dataframe?

在 Pandas 数据框中选择任何列包含字符串的所有行的最简洁方法是什么?

For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b?

例如,给定以下数据框,选择任何列中的值包含b?

df = pd.DataFrame({
    'x': ['foo', 'foo', 'bar'],
    'y': ['foo', 'foo', 'foo'],
    'z': ['foo', 'baz', 'foo']
})

I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]. Is there a simpler solution?

我对 Pandas 缺乏经验,到目前为止我想出的最好的是相当麻烦的df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]. 有没有更简单的解决方案?

Critically, I want to check for a match in anycolumns, not a particular column. Other similar questions, as best I can tell, only address a single or list of columns.

至关重要的是,我想检查任何列中的匹配项,而不是特定列。其他类似的问题,据我所知,只解决一个或一系列的列。

回答by ihightower

This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.

这个问题没有得到答案......但问题本身和评论已经得到了答案,这对我来说非常有效......而且我在其他任何地方都没有找到答案。

So I just copy pasted the answer for someone who can find it useful. I added case=False for a case insensitive serach

所以我只是将答案复制粘贴给那些觉得有用的人。我为不区分大小写的搜索添加了 case=False

Solution from @Reason:

来自@Reason 的解决方案:

the best I've come up with so far is the rather cumbersome

到目前为止我想出的最好的是相当麻烦

this one worked for me.

这个对我有用。

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)] 

Solution from @rbinnun:

来自@rbinnun 的解决方案:

this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think

这个对我来说是一个测试数据集..但是对于一些真实的数据集..它返回了一个unicode错误,如下所示,但我认为通常也是一个很好的解决方案

df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]

df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]

takes care of non-string columns, nans, etc.

处理非字符串列、nans 等。

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)