Python 在 Pandas 数据框中查找具有 NaN 的行的整数索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14016247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:11:37  来源:igfitidea点击:

Find integer index of rows with NaN in pandas dataframe

pythonpandas

提问by

I have a pandas DataFrame like this:

我有一个像这样的熊猫数据帧:

                    a         b
2011-01-01 00:00:00 1.883381  -0.416629
2011-01-01 01:00:00 0.149948  -1.782170
2011-01-01 02:00:00 -0.407604 0.314168
2011-01-01 03:00:00 1.452354  NaN
2011-01-01 04:00:00 -1.224869 -0.947457
2011-01-01 05:00:00 0.498326  0.070416
2011-01-01 06:00:00 0.401665  NaN
2011-01-01 07:00:00 -0.019766 0.533641
2011-01-01 08:00:00 -1.101303 -1.408561
2011-01-01 09:00:00 1.671795  -0.764629

Is there an efficient way to find the "integer" index of rows with NaNs? In this case the desired output should be [3, 6].

有没有一种有效的方法可以找到带有 NaN 的行的“整数”索引?在这种情况下,所需的输出应该是[3, 6].

采纳答案by diliop

For DataFrame df:

对于数据帧df

import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]

will give you back the MultiIndexthat you can use to index back into df, e.g.:

会给你回MultiIndex你可以用来索引回的df,例如:

df['a'].ix[index[0]]
>>> 1.452354

For the integer index:

对于整数索引:

df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]

回答by Wes McKinney

Here is a simpler solution:

这是一个更简单的解决方案:

inds = pd.isnull(df).any(1).nonzero()[0]

inds = pd.isnull(df).any(1).nonzero()[0]

In [9]: df
Out[9]: 
          0         1
0  0.450319  0.062595
1 -0.673058  0.156073
2 -0.871179 -0.118575
3  0.594188       NaN
4 -1.017903 -0.484744
5  0.860375  0.239265
6 -0.640070       NaN
7 -0.535802  1.632932
8  0.876523 -0.153634
9 -0.686914  0.131185

In [10]: pd.isnull(df).any(1).nonzero()[0]
Out[10]: array([3, 6])

回答by Filippo Mazza

And just in case, if you want to find the coordinates of 'nan' for all the columns instead (supposing they are all numericals), here you go:

以防万一,如果您想为所有列找到“nan”的坐标(假设它们都是数字),请执行以下操作:

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

df
   0  1  2    3    4  5
0  0  1  3  4.0  NaN  2
1  3  5  6  NaN  3.0  3

np.where(np.asanyarray(np.isnan(df)))
(array([0, 1]), array([4, 3]))

回答by nonya beeswax

Here is another simpler take:

这是另一个更简单的方法:

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

inds = np.asarray(df.isnull()).nonzero()

(array([0, 1], dtype=int64), array([4, 3], dtype=int64))

回答by naturesenshi

Don't know if this is too late but you can use np.where to find the indices of non values as such:

不知道这是否为时已晚,但您可以使用 np.where 来查找非值的索引:

indices = list(np.where(df['b'].isna()[0]))

回答by murthy10

I was looking for all indexes of rows with NaN values.
My working solution:

我正在寻找具有 NaN 值的行的所有索引。
我的工作解决方案:

def get_nan_indexes(data_frame):
    indexes = []
    print(data_frame)
    for column in data_frame:
        index = data_frame[column].index[data_frame[column].apply(np.isnan)]
        if len(index):
            indexes.append(index[0])
    df_index = data_frame.index.values.tolist()
    return [df_index.index(i) for i in set(indexes)]

回答by Vasyl Vaskivskyi

One line solution. However it works for one column only.

一行解决。但是它只适用于一列。

df.loc[pandas.isna(df["b"]), :].index

回答by Amirkhm

in the case you have datetime index and you want to have the values:

如果您有日期时间索引并且您想拥有以下值:

df.loc[pd.isnull(df).any(1), :].index.values

回答by Stone Austin

Let the dataframe be named dfand the column of interest(i.e. the column in which we are trying to find nulls) is 'b'. Then the following snippet gives the desired index of null in the dataframe:

将数据框命名为df,感兴趣的列(即我们试图在其中查找空值的列)为'b'。然后以下代码段给出了数据帧中所需的 null 索引:

   for i in range(df.shape[0]):
       if df['b'].isnull().iloc[i]:
           print(i)

回答by Adam Erickson

Here are tests for a few methods:

以下是几种方法的测试:

%timeit np.where(np.isnan(df['b']))[0]
%timeit pd.isnull(df['b']).nonzero()[0]
%timeit np.where(df['b'].isna())[0]
%timeit df.loc[pd.isna(df['b']), :].index

And their corresponding timings:

以及它们对应的时间:

333 μs ± 9.95 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
280 μs ± 220 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
313 μs ± 128 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.84 ms ± 1.59 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It would appear that pd.isnull(df['DRGWeight']).nonzero()[0]wins the day in terms of timing, but that any of the top three methods have comparable performance.

似乎pd.isnull(df['DRGWeight']).nonzero()[0]在时间方面获胜,但前三种方法中的任何一种都具有相当的性能。