Python 在 Pandas 中查询 NaN 和其他名称

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26535563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:36:34  来源:igfitidea点击:

Querying for NaN and other names in Pandas

pythonpandas

提问by Amelio Vazquez-Reina

Say I have a dataframe dfwith a column valueholding some float values and some NaN. How can I get the part of the dataframe where we have NaNusing the query syntax?

假设我有一个数据框,df其中一列包含value一些浮点值和一些NaN. 如何NaN使用查询语法获取数据帧的一部分?

The following, for example, does not work:

例如,以下内容不起作用:

df.query( '(value < 10) or (value == NaN)' )

I get name NaN is not defined(same for df.query('value ==NaN'))

我得到name NaN is not defined(相同的df.query('value ==NaN')

Generally speaking, is there any way to use numpy names in query, such as inf, nan, pi, e, etc.?

一般来说,有没有办法使用查询numpy的名称,如infnanpie,等?

采纳答案by DSM

In general, you could use @local_variable_name, so something like

一般来说,你可以使用@local_variable_name,所以像

>>> pi = np.pi; nan = np.nan
>>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]})
>>> df.query("(value < 10) and (value > @pi)")
   value
1      4
2      9

would work, but nanisn't equal to itself, so value == NaNwill always be false. One way to hack around this is to use that fact, and use value != valueas an isnancheck. We have

会工作,但nan不等于它自己,所以value == NaN总是假的。解决此问题的一种方法是使用该事实,并将其value != value用作isnan检查。我们有

>>> df.query("(value < 10) or (value == @nan)")
   value
0      3
1      4
2      9

but

>>> df.query("(value < 10) or (value != value)")
   value
0      3
1      4
2      9
5    NaN

回答by as - if

For rows where valueis not null

对于value不为空的行

df.query("value == value")

For rows where valueis null

对于value为空的行

df.query("value != value")

回答by Eric Ness

According to this answeryou can use:

根据此答案,您可以使用:

df.query('value < 10 | value.isnull()', engine='python')

I verified that it works.

我验证了它的工作原理。

回答by AreToo

Pandas fills empty cells in a DataFrame with NumPy's nan values. As it turns out, this has some funny properties. For one, nothingis equal to this kind of null, even itself. As a result, you can't search for it by checking for any particular equality.

Pandas 使用 NumPy 的 nan 值填充 DataFrame 中的空单元格。事实证明,这有一些有趣的特性。首先,没有任何东西等于这种空值,甚至是它本身。因此,您无法通过检查任何特定的相等性来搜索它。

In : 'nan' == np.nan
Out: False

In : None == np.nan
Out: False

In : np.nan == np.nan
Out: False

However, because a cell containing a np.nan value will not be equal to anything, including another np.nan value,we can check to see if it is unequal to itself.

但是,因为包含 np.nan 值的单元格将不等于任何值,包括另一个 np.nan 值,我们可以检查它是否不等于自身。

In : np.nan != np.nan
Out: True

You can take advantage of this using Pandas query method by simply searching for cells where the value in a particular column is unequal to itself.

您可以使用 Pandas 查询方法来利用这一点,只需搜索特定列中的值与其自身不相等的单元格即可。

df.query('a != a')

回答by James Page

df = pd.DataFrame({'value':[3,4,9,10,11,np.nan, 12]})

df.query("value < 10 or (~(value < 10) and ~(value >= 10))")

回答by Jarno

You can use the isnaand notnaSeriesmethods, which is concise and readable.

您可以使用isna和方法,它简洁易读。notnaSeries

import pandas as pd
import numpy as np

df = pd.DataFrame({'value': [3, 4, 9, 10, 11, np.nan, 12]})
available = df.query("value.notna()")
print(available)

#    value
# 0    3.0
# 1    4.0
# 2    9.0
# 3   10.0
# 4   11.0
# 6   12.0

not_available = df.query("value.isna()")
print(not_available)

#    value
# 5    NaN

Alternatively, you can use the toplevel pd.isnafunction, by referencing it as a local variable.

或者,您可以使用顶级pd.isna函数,将其作为局部变量引用。

import pandas as pd
import numpy as np


df = pd.DataFrame({'value': [3, 4, 9, 10, 11, np.nan, 12]})
df.query("@pd.isna(value)")

#    value
# 5    NaN