在 Pandas 数据框中选择任何列包含字符串的行的最简洁方法？

Question

提问by Reason

What is the most concise way to select all rows where any column contains a string in a Pandas dataframe?

在 Pandas 数据框中选择任何列包含字符串的所有行的最简洁方法是什么？

For example, given the following dataframe what is the best way to select those rows where the value in any column contains a b?

例如，给定以下数据框，选择任何列中的值包含b?

df = pd.DataFrame({
    'x': ['foo', 'foo', 'bar'],
    'y': ['foo', 'foo', 'foo'],
    'z': ['foo', 'baz', 'foo']
})

I'm inexperienced with Pandas and the best I've come up with so far is the rather cumbersome df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]. Is there a simpler solution?

我对 Pandas 缺乏经验，到目前为止我想出的最好的是相当麻烦的df[df.apply(lambda r: r.str.contains('b').any(), axis=1)]. 有没有更简单的解决方案？

Critically, I want to check for a match in anycolumns, not a particular column. Other similar questions, as best I can tell, only address a single or list of columns.

至关重要的是，我想检查任何列中的匹配项，而不是特定列。其他类似的问题，据我所知，只解决一个或一系列的列。

Answer 1

回答by ihightower

This question was not given an answer.. but the question itself and the comments has got the answer already which worked really well for me.. and I didn't find the answer anywhereelse I looked.

这个问题没有得到答案......但问题本身和评论已经得到了答案，这对我来说非常有效......而且我在其他任何地方都没有找到答案。

So I just copy pasted the answer for someone who can find it useful. I added case=False for a case insensitive serach

所以我只是将答案复制粘贴给那些觉得有用的人。我为不区分大小写的搜索添加了 case=False

Solution from @Reason:

来自@Reason 的解决方案：

the best I've come up with so far is the rather cumbersome

到目前为止我想出的最好的是相当麻烦

this one worked for me.

这个对我有用。

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)]

Solution from @rbinnun:

来自@rbinnun 的解决方案：

this one worked for me for a test dataset.. but for some real data set.. it returned a unicode error as below, but generally a good solution too I think

这个对我来说是一个测试数据集..但是对于一些真实的数据集..它返回了一个unicode错误，如下所示，但我认为通常也是一个很好的解决方案

df[df.apply(lambda row: row.astype(str).str.contains('b', case=False).any(), axis=1)]

takes care of non-string columns, nans, etc.

处理非字符串列、nans 等。

UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 5: ordinal not in range(128)

在 Pandas 数据框中选择任何列包含字符串的行的最简洁方法？

提问by Reason

回答by ihightower

相关推荐

最近更新

标签

在 Pandas 数据框中选择任何列包含字符串的行的最简洁方法？

提问by Reason

回答by ihightower

相关推荐

Pandas 按名称将几组列熔化为多个目标列

pandas 基于过滤器更改数据框列的值

如何使用来自多列的参数调用 pandas.rolling.apply？

Python：Pandas 系列 - 为什么使用 loc？

相关推荐

最近更新

标签