Python 如何确定 Pandas 列是否包含特定值

Question

提问by Michael

I am trying to determine whether there is an entry in a Pandas column that has a particular value. I tried to do this with if x in df['id']. I thought this was working, except when I fed it a value that I knew was not in the column 43 in df['id']it still returned True. When I subset to a data frame only containing entries matching the missing id df[df['id'] == 43]there are, obviously, no entries in it. How to I determine if a column in a Pandas data frame contains a particular value and why doesn't my current method work? (FYI, I have the same problem when I use the implementation in this answerto a similar question).

我正在尝试确定 Pandas 列中是否有具有特定值的条目。我试图用if x in df['id']. 我认为这是有效的，除非我给它提供了一个我知道不在列中的值，43 in df['id']但它仍然返回True。当我将数据框子集化为仅包含与缺少的 id 匹配的条目时df[df['id'] == 43]，显然其中没有条目。如何确定 Pandas 数据框中的列是否包含特定值，为什么我当前的方法不起作用？（仅供参考，当我在这个类似问题的答案中使用实现时，我遇到了同样的问题）。

Answer 1

采纳答案by Andy Hayden

inof a Series checks whether the value is in the index:

in系列的检查值是否在索引中：

In [11]: s = pd.Series(list('abc'))

In [12]: s
Out[12]: 
0    a
1    b
2    c
dtype: object

In [13]: 1 in s
Out[13]: True

In [14]: 'a' in s
Out[14]: False

One option is to see if it's in uniquevalues:

一种选择是查看它是否具有唯一值：

In [21]: s.unique()
Out[21]: array(['a', 'b', 'c'], dtype=object)

In [22]: 'a' in s.unique()
Out[22]: True

or a python set:

或 python 集：

In [23]: set(s)
Out[23]: {'a', 'b', 'c'}

In [24]: 'a' in set(s)
Out[24]: True

As pointed out by @DSM, it may be more efficient (especially if you're just doing this for one value) to just use in directly on the values:

正如@DSM 所指出的，直接在值上使用 in 可能更有效（特别是如果您只是为一个值执行此操作）：

In [31]: s.values
Out[31]: array(['a', 'b', 'c'], dtype=object)

In [32]: 'a' in s.values
Out[32]: True

Answer 2

回答by ffeast

You can also use pandas.Series.isinalthough it's a little bit longer than 'a' in s.values:

您也可以使用pandas.Series.isin虽然它比'a' in s.values以下长一点：

In [2]: s = pd.Series(list('abc'))

In [3]: s
Out[3]: 
0    a
1    b
2    c
dtype: object

In [3]: s.isin(['a'])
Out[3]: 
0    True
1    False
2    False
dtype: bool

In [4]: s[s.isin(['a'])].empty
Out[4]: False

In [5]: s[s.isin(['z'])].empty
Out[5]: True

But this approach can be more flexible if you need to match multiple values at once for a DataFrame (see DataFrame.isin)

但是，如果您需要为 DataFrame 一次匹配多个值，则这种方法会更加灵活（请参阅DataFrame.isin）

>>> df = DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
       A      B
0   True  False  # Note that B didn't match 1 here.
1  False   True
2   True   True

Answer 3

回答by Eli B

Simple condition:

简单条件：

if any(str(elem) in ['a','b'] for elem in df['column'].tolist()):

Answer 4

回答by U10-Forward

Or use Series.tolistor Series.any:

或使用Series.tolist或Series.any：

>>> s = pd.Series(list('abc'))
>>> s
0    a
1    b
2    c
dtype: object
>>> 'a' in s.tolist()
True
>>> (s=='a').any()
True

Series.tolistmakes a list about of a Series, and the other one i am just getting a boolean Seriesfrom a regular Series, then checking if there are any Trues in the boolean Series.

Series.tolist制作一个关于 a 的列表Series，另一个我只是Series从常规中获取一个布尔值Series，然后检查布尔值中是否有任何Trues Series。

Answer 5

回答by Shahir Ansari

found = df[df['Column'].str.contains('Text_to_search')]
print(found.count())

the found.count()will contains number of matches

在found.count()遗嘱中含有的比赛数量

And if it is 0 then means string was not found in the Column.

如果它是 0 则表示在列中找不到字符串。

Answer 6

回答by Vicky Ding

I don't suggest to use "value in series", which can lead many errors. Please see this answer for detail: Using in operator with Pandas series

我不建议使用“串联值”，这会导致很多错误。有关详细信息，请参阅此答案：Using in operator with Pandas series

Answer 7

回答by Allen Wang

I did a few simple tests:

我做了一些简单的测试：

In [10]: x = pd.Series(range(1000000))

In [13]: timeit 999999 in x.values
567 μs ± 25.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [15]: timeit x.isin([999999]).any()
9.54 ms ± 291 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: timeit (x == 999999).any()
6.86 ms ± 107 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: timeit 999999 in set(x)
79.8 ms ± 1.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [21]: timeit x.eq(999999).any()
7.03 ms ± 33.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [22]: timeit x.eq(9).any()
7.04 ms ± 60 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: timeit 9 in x.values
666 μs ± 15.7 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Interestingly it doesn't matter if you look up 9 or 999999, it seems like it takes about the same amount of time using the in syntax (must be using binary search)

有趣的是，查找 9 或 999999 并不重要，使用 in 语法似乎花费的时间大致相同（必须使用二进制搜索）

In [24]: timeit 9 in x.values
666 μs ± 15.7 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [25]: timeit 9999 in x.values
647 μs ± 5.21 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [26]: timeit 999999 in x.values
642 μs ± 2.11 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [27]: timeit 99199 in x.values
644 μs ± 5.31 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [28]: timeit 1 in x.values
667 μs ± 20.8 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Seems like using x.values is the fastest, but maybe there is a more elegant way in pandas?

似乎使用 x.values 是最快的，但也许在大熊猫中有更优雅的方式？

Answer 8

回答by Ramana Sriwidya

Use

用

df[df['id']==x].index.tolist()

If xis present in idthen it'll return the list of indices where it is present, else it gives an empty list.

如果x存在，id那么它将返回它存在的索引列表，否则它给出一个空列表。

Answer 9

回答by Namrata Tolani

Suppose you dataframe looks like :

假设你的数据框看起来像：

Now you want to check if filename "80900026941984" is present in the dataframe or not.

现在您要检查数据框中是否存在文件名“80900026941984”。

You can simply write :

你可以简单地写：

if sum(df["filename"].astype("str").str.contains("80900026941984")) > 0:
    print("found")

Python 如何确定 Pandas 列是否包含特定值

提问by Michael

采纳答案by Andy Hayden

回答by ffeast

回答by Eli B

回答by U10-Forward

回答by Shahir Ansari

回答by Vicky Ding

回答by Allen Wang

回答by Ramana Sriwidya

回答by Namrata Tolani

相关推荐

最近更新

标签

Python 如何确定 Pandas 列是否包含特定值

提问by Michael

采纳答案by Andy Hayden

回答by ffeast

回答by Eli B

回答by U10-Forward

回答by Shahir Ansari

回答by Vicky Ding

回答by Allen Wang

回答by Ramana Sriwidya

回答by Namrata Tolani

相关推荐

Python - 从所有循环中`break`

Python 如何配对两个列表？

Python 使用自定义错误处理程序时如何从 abort 命令访问错误消息

Python 使用覆盖范围的 py.test 不包括导入

相关推荐

最近更新

标签